I'm trying to write a codec for Code page 437. My plan was to just pass the ASCII characters through and map the remaining 128 characters in a table, using the utf-16 value as key.
For some combined charaters (letters with dots, tildes etcetera), the character appears to occupy two QChars.
A test program that prints the utf-16 values for the arguments to the program:
#include <iostream>
#include <QString>
using namespace std;
void print(QString qs)
{
for (QString::iterator it = qs.begin(); it != qs.end(); ++it)
cout << hex << it->unicode() << " ";
cout << "\n";
}
int main(int argc, char *argv[])
{
for (int i = 1; i < argc; i++)
print(QString::fromStdString(argv[i]));
}
Some output:
$ ./utf16 Ç ü é
c3 87
c3 bc
c3 a9
I had expected
c387
c3bc
c3a9
Tried the various normalizationsforms avaialable in QString but no one had fewer bytes than the default.
Since QChar is 2 bytes it should be able to hold the value of the characters above in one object. Why does the QString use two QChars? How can I fetch the combined unicode value?