0

First off, this is NOT a duplicate of: Turn a C string with NULL bytes into a char array , because the given answer doesn't work when the char *'s are Unicode.

I think the problem is that because I am trying to use UTF-8 encoded char *'s instead of ASCII char *'s, and the length of each character is different and thus, this doesn't work :

char *Buffer;             // your null-separated strings
char *Current;            // Pointer to the current string
// [...]
for (Current = Buffer; *Current; Current += strlen(Current) + 1)
  printf("GetOpenFileName returned: %s\n", Current);

Does anyone have a similar solution that works on Unicode strings?

I have been banging my head on the this for over 4 hours now. C doesn't agree with me.

EDIT: I think that the problem is that the char * is now UTF-8 instead of ASCII.

Community
  • 1
  • 1
Alec Gorge
  • 17,110
  • 10
  • 59
  • 71
  • What do you mean "Unicode", UTF-16? – Matthew Flaschen Apr 11 '10 at 01:50
  • I'm not sure. I am getting this so-called "Unicode" from this function: http://pastebin.com/j1pFrWPa . The name of the function implies a UTF-8 char *, but that confuses me. – Alec Gorge Apr 11 '10 at 01:55
  • I am trying to make my program friendly to those non-ASCII letters (accents, Russian etc), but I still need to be able to have a file picker. – Alec Gorge Apr 11 '10 at 02:02
  • Hmm. I hadn't thought of WideChar2Utf8 stopping at the first null string. That is probably the cause. The thing that is being called with the first argument to WideChar2Utf8 is `TCHAR szFile[MAX_PATH] = TEXT("");` As for my console, I changed my font to be Consolas, but you are right, the cmd line probably still doesn't support utf-8 correctly. It is okay, because once I start breaking the null separated string, I can go back to using the GUI elements. – Alec Gorge Apr 11 '10 at 02:12

1 Answers1

2

Don't use char*. Use wchar_t* and the related functions

wchar_t *Buffer;             // your null-separated strings
wchar_t *Current;            // Pointer to the current string
// [...]
for (Current = Buffer; *Current; Current += wstrlen(Current) + 1)
  wprintf(L"GetOpenFileName returned: %s\n", Current);

Incidentally, wchar_t is 16 bits on Windows, not variable-width. If your source data is UTF8-encoded as char*, you should first convert it to wchar_t* to work with it.

Marcelo Cantos
  • 181,030
  • 38
  • 327
  • 365
  • I tried that. I choose the file H:\files.txt and this is the output: http://pastebin.com/s9vJYiFp – Alec Gorge Apr 11 '10 at 01:48
  • how do I convert it from UTF-8 char * to wchar_t* ? – Alec Gorge Apr 11 '10 at 01:49
  • If I leave everything the same (just using UTF-8 char*) I only get the first result (`H:\ `) not `H:\ ` and `files.txt` like I am expecting. – Alec Gorge Apr 11 '10 at 01:58
  • 2
    Your pastebin output happens because your data is already UTF-16, but you are holding it with a `char*` pointer, so it looks like `"H\0:\0\\\0f\0i\0l\0e\0s\0.\0t\0x\0t\0"`, i.e., lots of single-character strings. I'll need to see more surrounding code to understand what you're doing wrong. – Marcelo Cantos Apr 11 '10 at 02:03
  • Ah, that makes sense! The problematic char: `Buffer = WideChar2Utf8(szFile, &file_len)`. http://pastebin.com/j1pFrWPa is the code of WideChar2Utf8. – Alec Gorge Apr 11 '10 at 02:07
  • If I try doing `for(Current = szFile...` I get something that supports your hypothesis: http://pastebin.com/E6aSeHwh So now what do I do? – Alec Gorge Apr 11 '10 at 02:21
  • Since that comment can't be edited... If I try doing `for(Current = szFile...` and selected 2 files (H:\files.txt and H:\md5.php) I get: http://pastebin.com/E6aSeHwh So now what do I do? – Alec Gorge Apr 11 '10 at 02:27
  • I'm confused. Why are you trying to convert to UTF-8? Just use the LPCTSTR type directly with the TCHAR versions of functions (`_tcslen` and `_tprintf`). – Marcelo Cantos Apr 11 '10 at 04:31
  • I don't know what I was doing! Thanks for your help! – Alec Gorge Apr 13 '10 at 01:20