2

Code points of some Unicode characters (like ) consume more than 2-bytes. How do I use Win32 API functions like CreateFile() with these characters?

WinBase.h

WINBASEAPI
__out
HANDLE
WINAPI
CreateFileA(
    __in     LPCSTR lpFileName,
    __in     DWORD dwDesiredAccess,
    __in     DWORD dwShareMode,
    __in_opt LPSECURITY_ATTRIBUTES lpSecurityAttributes,
    __in     DWORD dwCreationDisposition,
    __in     DWORD dwFlagsAndAttributes,
    __in_opt HANDLE hTemplateFile
    );
WINBASEAPI
__out
HANDLE
WINAPI
CreateFileW(
    __in     LPCWSTR lpFileName,
    __in     DWORD dwDesiredAccess,
    __in     DWORD dwShareMode,
    __in_opt LPSECURITY_ATTRIBUTES lpSecurityAttributes,
    __in     DWORD dwCreationDisposition,
    __in     DWORD dwFlagsAndAttributes,
    __in_opt HANDLE hTemplateFile
    );
#ifdef UNICODE
#define CreateFile  CreateFileW
#else
#define CreateFile  CreateFileA
#endif // !UNICODE

LPCSTR and LPCWSTR are define in WinNT.h as:

typedef __nullterminated CONST CHAR *LPCSTR, *PCSTR;
typedef __nullterminated CONST WCHAR *LPCWSTR, *PCWSTR;

CHAR and WCHAR is defined in WinNT.h as:

typedef char CHAR;
#ifndef _MAC
typedef wchar_t WCHAR;    // wc,   16-bit UNICODE character
#else
// some Macintosh compilers don't define wchar_t in a convenient location, or define it as a char
typedef unsigned short WCHAR;    // wc,   16-bit UNICODE character
#endif

CreateFileA() accepts LPCSTR file names, which are stored in 8-bit char array internally.
CreateFileW() accepts LPCWSTR file names, which are stored in 16-bit wchar_t array internally.

I have created a file in the position C:\.txt. It looks like it is not possible to open this file using CreateFile(), because it contains the character whose Unicode code point is 0x24B62 which doesn't fit even in a WCHAR array cell.

But that file exists in my harddisk and Windows manages it normally. How do I open this file by a Win32 API function, like Windows does internally?

hkBattousai
  • 10,583
  • 18
  • 76
  • 124

1 Answers1

7

Such characters are represented by UTF-16 surrogate pairs. It takes two wide character elements to represent that code point. So, you just need to call CreateFile passing the necessary surrogate pair. And naturally you need to use the wide variant of CreateFile.

Presumably you won't be hard-coding such a filename in your code. In which case you'll be getting it from a file dialog, FindFirstFile, etc. And those APIs will give you the appropriate UTF-16 encoded buffer for the file.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • Do you mean that `CreateFile()` always takes UTF-16 encoded strings, but since UTF-16 encoding of most commonly used strings are simply themselves, we don't feel this, and it comes into action in case of passing strings that contain these complex characters? – hkBattousai Sep 28 '12 at 17:33
  • UTF-16 encoding is a variable width encoding. Some code points are coded with one character unit, some with two character units. Read around the topic of surrogate pairs. I gave you a wikipedia link. – David Heffernan Sep 28 '12 at 17:35
  • I read your link. I want to learn the behavior of `CreateFileW()`. Does it **always** take `lpFileName` parameter in UTF-16 encoded form? Or do I have to switch into this mode by doing something first? – hkBattousai Sep 28 '12 at 17:54
  • 1
    Yes, `CreateFileW` receives UTF-16 encoded filename. That's the Windows convention. The `W` suffix functions work with UTF-16 data. In fact Windows is natively a UTF-16 system. – David Heffernan Sep 28 '12 at 17:54
  • Thanks, this helped me solve my issue with Python Win32file.CreateFile also. – SilentSteel Oct 04 '13 at 16:53