0

So I was trying to use 'ZERO WIDTH SPACE' (U+200B) in my Qt 4.8 compiled with MSVC 9.0 project and encountered a problem:

// This works
QString::fromUtf8("Test​String"); // Note the zero width space in between the words
// This prints a '?'
QString::fromUtf8("Test\u200BString");

Is there a reason this happens, is this a known bug? Am I overlooking something? Testing the same with Qt 5.6 works flawlessly.

Edit:

// This also works with Qt 4.8
QString::fromUtf8("Test%1String").arg(QChar(0x200b));

As @Scheff's Cat mentioned, this seems to imply the problem isn't Qt but msvc 9.0 can't handle '\u200B'. I will investigate this further and keep this question updated.

Baumflaum
  • 749
  • 7
  • 20
  • Which compiler do you use? – gerum Nov 09 '21 at 08:55
  • MSVC 9.0 (VS2008) for Qt4.8 with UNICODE and _UNICODE. MinGW with Qts default version for 5.6. – Baumflaum Nov 09 '21 at 10:14
  • I highly suspect that `\u200B` doesn't result in the UTF-8 sequence you intend (especially not with the quite a bit aged VS2008). Try `\xe2\x80\x8b` instead. (That's the corresponding UTF-8 sequence - just looked up in [UTF-8 Table](https://www.utf8-chartable.de/unicode-utf8-table.pl).) – Scheff's Cat Nov 09 '21 at 10:33
  • 1
    I struggled to believe it and copied your other `"Test​String"`. I've to admit I got in fact the hex dump: `54 65 73 74 e2 80 8b 53 74 72 69 6e 67`. (I could've saved the time for looking into the UTF-8 table...) ;-) – Scheff's Cat Nov 09 '21 at 10:37
  • I think it isn't related to the Qt version but to the compiler you have to use to compile for the resp. version... – Scheff's Cat Nov 09 '21 at 10:38
  • You helped me to get on the right track! Thank you very much. If you formulate your answer as an answer I could accept it. – Baumflaum Nov 09 '21 at 11:54
  • 1
    Did you try `"Test\xe2\x80\x8bString"` also? IMHO, this should've worked as well - even with VS2008. It provides the UTF-8 sequence for `U+200B` byte for byte. Although, VS2008 won't have UTF-8 support, it simply will built this string literal into the executable while `QString::fromUtf8()` should decode the bytes properly. – Scheff's Cat Nov 09 '21 at 17:14
  • That did not work with an error along the lines of "too big for character". I will investigate it further and check [this answer](https://stackoverflow.com/questions/13692200/c-string-literal-too-big-for-character?rq=1). I'll keep you updated. Somethings fishy here. – Baumflaum Nov 09 '21 at 20:06
  • 1
    The VS2008 compiler is expecting the source code to be the current ANSI code page (probably Windows-1252). That code point isn't supported in that code page, so it is replaced with a replacement character (?). When you entered the actual character in what I suppose is UTF-8-encoded source, the character would actually be encoded as UTF-8 bytes. Newer compiler versions have a /utf-8 switch that tells the compiler to use UTF-8 as the source encoding. – Mark Tolonen Nov 09 '21 at 20:07
  • 1
    So [this thread about MSVC 9.0](https://stackoverflow.com/questions/688760/how-to-create-a-utf-8-string-literal-in-visual-c-2008) describes the solution to this problem. Thank you very much for that pointer. :) – Baumflaum Nov 09 '21 at 20:15

0 Answers0