宽字符标量L"xx"在VC6.0/7.0和GNU g++中的不同实现- VC++

当前位置: 首页 > 图文教程 > 开发语言 > VC++ > 宽字符标量L"xx"在VC6.0/7.0和GNU g++中的不同实现

技术文章搜索

关键字

VC++ 中的宽字符标量L"xx"在VC6.0/7.0和GNU g++中的不同实现

出处:互联网 整理: 软晨网（RuanChen.com） 发布: 2009-10-30 浏览: 39 ::

收藏到网摘: n/a

UTF-8与GB2312之间的互换用VC++设计语法编辑器

宽字符标量L"xx"在VC6.0/7.0和GNU g++中的不同实现

作者：乾坤一笑

　　锲子：本文源于在 VCKBASE C++ 论坛和周星星大哥的一番讨论，这才使我追根索源，找到了理论依据和实践的证明。（本文一些资料和测试代码由周星星提供）

《The C++ Programming Language 3rd》中有这么两段话：

from 4.3:
A type wchar_t is provided to hold characters of a larger character set such as Unicode. It is a distinct type. The size of wchar_t is implementation-defined and large enough to hold the largest character set supported by the implementation’s locale (see §21.7, §C.3.3). The strange name is a leftover from C. In C, wchar_t is a typedef (§4.9.7) rather than a builtin type. The suffix _ t was added to distinguish standard typedefs.

from 4.3.1:
Wide character literals are of the form L′ab′, where the number of characters between the quotes and their meanings is implementation-defined to match the wchar_t type. A wide character literal has type wchar_t.

这两段话中有两个要点是我们关心的：

wchar_t 的长度是由实现决定的；
L"ab" 的含义是由实现决定的。

那么GNU g++和VC6.0/7.0各是怎么实现的呢？看下面代码：

//author: **.Zhou#include <stdio.h>#include <stdlib.h>#include <windows.h>void prt( const void* padd, size_t n ){ const unsigned char* p = static_cast<const unsigned char*>( padd ); const unsigned char* pe = p + n; for( ; p<pe; ++p ) printf( " %02X", *p ); printf( "\n" );}int main(){ char a[] = "VC知识库"; wchar_t b[] = L"VC知识库"; prt( a, sizeof(a) ); prt( b, sizeof(b) ); system( "Pause" ); // 说明： // Dev-CPP4990 显示为： // 56 43 D6 AA CA B6 BF E2 00 // 56 00 43 00 D6 00 AA 00 CA 00 B6 00 BF 00 E2 00 00 00 // VC++6.0 和 VC.net2003 显示为： // 56 43 D6 AA CA B6 BF E2 00 // 56 00 43 00 E5 77 C6 8B 93 5E 00 00 // 可见，Dev-CPP中的L""不是unicode编码，只是简单的扩充，汉字需要4bytes存储 HWND h = FindWindow( NULL, "计算器" ); SetWindowTextA( h, a ); system( "Pause" ); SetWindowTextW( h, b ); system( "Pause" ); // 说明： // VC++6.0 和 VC.net2003 都能成功将标题改为"VC知识库" // 而 Dev-CPP4990 只有 SetWindowTextA 显示正确，而 SetWindowTextW 显示的是乱码 }

　　这段代码说明了，g++（Dev-CPP 用的是 MingGW 编译器）中 L"xx" 解释为把作为 non-wide-char 的 "xx" 扩展为作为 wide-char 的 wchar_t，不足则在高位补0；而 VC6.0 的 L"xx" 解释为把作为 MBCS 的 "xx" 转换为作为 unicode 的 WCHAR，目前的 MBCS 是以 char 为一个存储单元的，而 WCHAR 在 winnt.h 中定义为 typedef wchar_t WCHAR。在 Windows 平台上，只要是超过 0~127 范围内的 char 型字符，都被视为 MBCS，它由1到2个字节组成，MBCS 字符集跟它的地区代码页号有关。在某个特定的 Windows 平台，默认的代码页号可以在控制面板 -> 区域选项中设定。

关于上述结论可以有下面这个程序来验证：

//author: smileonce#include <stdio.h>#include <stdlib.h>#include <assert.h>#include <windows.h>void prt( const void* padd, size_t n ){ const unsigned char* p = static_cast<const unsigned char*>( padd ); const unsigned char* pe = p + n; for( ; p<pe; ++p ) printf( " %02X", *p ); printf( "\n" );}int main(){ char a[] = "VC知识库"; wchar_t b[] = L"VC知识库"; prt( a, sizeof(a) ); prt( b, sizeof(b) ); PSTR pMultiByteStr = (PSTR)a; PWSTR pWideCharStr; int nLenOfWideCharStr; // 利用API函数MultiByteToWideChar()来把a转化成unicode字符 nLenOfWideCharStr = MultiByteToWideChar( CP_ACP, 0, pMultiByteStr, -1, NULL, 0); pWideCharStr = (PWSTR)HeapAlloc( GetProcessHeap(), 0, nLenOfWideCharStr * sizeof(WCHAR) ); assert( pWideCharStr ); MultiByteToWideChar( CP_ACP, 0, pMultiByteStr, -1,  
    UTF-8与GB2312之间的互换  用VC++设计语法编辑器  
     
    评论 (0)  All  
 
    登陆 还没注册？           
 
 
 

 



		设为首页 |	加入收藏 |	关于我们 |	联系我们 |	广告服务 |	网站地图 |	友情链接 |	在线帮助 |	免责声明	Sitemap	W3school	
		如果您觉得本站不错，别忘了告诉您的同学和同事哟 ^_^	
	欢迎各网站转载我们的原创，转载时请注明出处。
		All copyright is reserved by 软晨网（RuanChen.com）

VC++ 中的 宽字符标量L"xx"在VC6.0/7.0和GNU g++中的不同实现

评论 (0) All

VC++ 中的宽字符标量L"xx"在VC6.0/7.0和GNU g++中的不同实现