5

I'm trying to write some C code which is portable only so far as the user has gcc, and has glib installed.

From all my research, I've found that with gcc, a wchar_t is always defined as 4 bytes, and with glib a gunichar is also 4 bytes.

What I haven't figured out is if like a gunichar, a wchar_t is encoded as UCS4 as well. Is this the case? If so, I should be able to simply cast a gunichar* to a wchar_t* and use the stdc wcs* functions, right?

Mateusz Piotrowski
  • 8,029
  • 10
  • 53
  • 79
ckot
  • 819
  • 2
  • 10
  • 23

1 Answers1

9

If you use GLib, don't use wchar_t. Use its unicode support, it's a lot better than the C standard library's support.

wchar_t is 4 bytes on Linux and Mac OS (and a few others), not on Windows (it's 2 bytes there) and some others. Portable code means avoiding wchar_t like the plague.

rubenvb
  • 74,642
  • 33
  • 187
  • 332
  • hmm. thanks. i just noticed that almost all of the glib unicode functions operate on utf8 strings, and from what I understand (could be wrong) iterating through a multi-byte encoded char array is inefficient as you need to use an iterator to make sure you get a full char and simply not a byte (can't simply i++ you way through the array). I just now re-checked the docs and g_utf8_next_char() is implemented as a macro, so I guess it's not so much of an issue to me anymore. thanks again. – ckot Mar 24 '12 at 11:20
  • 1
    @skot decent unicode support is costly, any way you put it. Worry about performance once your program/library works, not before. – rubenvb Mar 24 '12 at 11:23
  • 1
    good point. things seem to be working well so far, so I'll cross that bridge if/when I need to. thanks again. – ckot Mar 24 '12 at 12:04