3

In the Qt documentation it states that (among others) the following Unicode string encodings are supported:

  • UTF-8
  • UTF-16
  • UTF-16BE
  • UTF-16LE
  • UTF-32
  • UTF-32BE
  • UTF-32LE

Due to the three different codecs listed for 2 and 4 octet encoded Unicode, I was wondering: how do the two non-endian codecs ("UTF-16" and "UTF-32") decide which endianness to use?

Dan Cecile
  • 2,403
  • 19
  • 21
Samuel Harmer
  • 4,264
  • 5
  • 33
  • 67

1 Answers1

3

Based on the source code in src/corelibs/codecs/, it seems Qt uses the byte ordering of the host for UTF-16 and UTF-32.

If you use QTextCodec to read an existing Unicode string that has a BOM, and you didn't explicitly ask to ignore the header, the byte ordering detected in the string is used.

  • In *qutfcodec_p.h* both QUtf16Codec::e and QUtf32Codec::e are initialized with the value DetectEndianness (an enum).

  • In qutfcodec.cpp, near the beginning of the functions convertFromUnicode and convertToUnicode from the classes QUtf16 and QUtf32 (used by QUtf16Codec and QUtf32Codec), you can find the line:

    endian = (QSysInfo::ByteOrder == QSysInfo::BigEndian) 
        ? BigEndianness : LittleEndianness;
    
alexisdm
  • 29,448
  • 6
  • 64
  • 99