0

I have problem with QJsonObject character encoding. QJsonObject::toJson() returns string with international characters as hex values:

s:  "żółć"
obj:  QJsonObject({"s":"żółć"})
doc:  QJsonDocument({"s":"żółć"})
JSON:  "{\n    \"s\": \"\xC5\xBC\xC3\xB3\xC5\x82\xC4\x87\"\n}\n"

Code:

#include <QCoreApplication>
#include <QDebug>
#include <QJsonDocument>
#include <QJsonObject>

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);

    QString s = "żółć";
    qDebug() << "s: " << s;

    QJsonObject obj;
    obj["s"] = s;
    qDebug() << "obj: " << obj;

    QJsonDocument doc(obj);
    qDebug() << "doc: " << doc;


    qDebug() << "JSON: " << doc.toJson();


    return a.exec();
}

How can I get JSON string with international characters?

MrEricSir
  • 8,044
  • 4
  • 30
  • 35
Sajmplus
  • 262
  • 4
  • 18
  • 1
    No. That is qDebug() in Qt Creator debug output console: http://stackoverflow.com/questions/16884570/qdebug-doesnt-support-unicode-strings-on-windows JSON is good with UNICODE by definition. – Alexander V Oct 07 '16 at 15:18

1 Answers1

2

QJsonObject::toJson() returns string with national characters as hex values

Assuming you meant QJsonDocument::toJson() this is not true, QJsonDocument::toJson() returns a QByteArray encoded in UTF-8, this way it is capable of encoding all possible characters, or code points, defined by Unicode. After that, you can send/save the result QByteArray to be read and parsed later, and you should be able to get your original string out of it.

So, the HEX characters you are seeing in the debug output are not really there, they are just QDebug's way of printing QByteArrays:

Normally, QDebug prints the array inside quotes and transforms control or non-US-ASCII characters to their C escape sequences (\xAB). This way, the output is always 7-bit clean and the string can be copied from the output and pasted back into C++ sources, if necessary.

You can use qDebug().noqoute() to see how your JSON byte array looks like without escaping those characters:

qDebug().noquote() << "JSON: " << doc.toJson();

Or, you can print it as a QString instead:

qDebug() << "JSON: " << QString::fromUtf8(doc.toJson());

Note:

It is very bad practice to put non-ascii chars in string literals. You should read them from a resource or an escaped string literal while specifying their encoding (maybe using QString::fromUtf8(), QString::fromUtf16(), ...).

Mike
  • 8,055
  • 1
  • 30
  • 44
  • Regarding string literals in Qt, this is exactly what the [QStringLiteral](http://doc.qt.io/qt-5/qstring.html#QStringLiteral) macro was made for. – MrEricSir Oct 07 '16 at 17:25
  • @MrEricSir , `QStringLiteral` macro was made specifically to generate a `QString` out of a literal at compile time in order to avoid the overhead of generating one at run time. I am not sure if one can safely use locale-dependent non-ascii characters in literals in their source code files when using it. can you confirm that? – Mike Oct 07 '16 at 17:47
  • @MrEricSir , since `QStringLiteral` seem to use wide string literals [here](https://code.qt.io/cgit/qt/qtbase.git/tree/src/corelib/tools/qstring.h#n139) and [here](https://code.qt.io/cgit/qt/qtbase.git/tree/src/corelib/tools/qstring.h#n148), it maybe safe. But I would never expect something like `QString("żółć")` to be safe. Am I right with that? – Mike Oct 07 '16 at 17:54
  • As long as you're using an up-to-date compiler, `QStringLiteral` should always perform the correct conversion. I believe the `QString` constructor that takes a string literal will only convert from UTF-8, which may or may not be what's desired. – MrEricSir Oct 07 '16 at 18:04
  • Ok, so how should I get user input with international characters and build JSON with them? – Sajmplus Oct 10 '16 at 20:55
  • @Sajmplus , there is no problem with that. The problem we are discussing here applies to string literals. using `QLineEdit::text()` to get a `QString` for user input for example is sure to be alright. – Mike Oct 10 '16 at 21:37
  • And how can I make `QNetworkAccessManager::post` request to WebAPI using encoded `QByteArray`? When I POST with curl `Żółć` it seems to be ok, but when I post UTF-8 encoded `\xC5\xBB\xC3\xB3\xC5\x82\xC4\x87` i get `????` as reply. – Sajmplus Oct 18 '16 at 18:48