0

Possible Duplicate:
I lose “unicodeness” when qDebug()ing after instancing a QApplication

I am trying to use Unicode characters in my project, but the Unicode characters are converting to some values(ex: ?).

#include <QtCore/QCoreApplication>
#include <QTextCodec>
#include <QDebug>
int main(int argc, char *argv[]) {
  QCoreApplication a(argc, argv);
  QTextCodec *codec = QTextCodec::codecForName("UTF-8");
  QTextCodec::setCodecForCStrings(codec);
  //Unicode character here is '
  QString unicode = "Hello I’ve to go";
  qDebug() << "Unicode String: " << unicode;
  return a.exec();
}

The above code prints the string value as Hello I?ve to go

Community
  • 1
  • 1
JChan
  • 1,411
  • 4
  • 24
  • 34
  • Hardcoded strings are char, thus you can only display ASCII. You want to use wide char, which can be achieved using L"€" for example, but to be honest ... don't use hardcoded strings in your code. Not every compiler can use unicode in its editor. – dowhilefor Oct 10 '12 at 15:42
  • @dowhilefor, In my real project, this value is read from the file name. I tried the similar implementation in my project, but it didnt work. – JChan Oct 10 '12 at 15:46
  • The apostrophe character (') is part of utf-8 and ASCII, so is the grave accent (`) character – Gearoid Murphy Oct 10 '12 at 15:46
  • 2
    That's not the normal apostrphe character. That's the UTF-8 character U+2019. [Link](http://www.fileformat.info/info/unicode/char/2019/index.htm) – Cornstalks Oct 10 '12 at 15:54
  • So in which variable do you have the unicode string? Is it a char*, w_char* before? At somepoint you are reading the value from, lets say a text file, at that point you already have some kind of conversion. QString has methods to parse from a unicode string, have you tried these? – dowhilefor Oct 10 '12 at 16:58

3 Answers3

2

C++11 adds support for unicode characters. Try properly escaping that unicode character:

u8"Hello I\u2019ve to go"

This works for me. Though it's entirely possible you don't have the U+2019 codepoint in your font, so it's drawing it as a ? in place of the proper character.

Cornstalks
  • 37,137
  • 18
  • 79
  • 144
  • Universal Character Names were supported in C++98, and this doesn't solve the issue, which is that the execution charset can't represent that character. – bames53 Oct 10 '12 at 16:00
  • Yes, adding `u8` will fix the problem on compilers that support that. Escaping the apostrophe is [irrelevant](http://ideone.com/30NhV). Also I suspect that the individual asking this question is not using gcc so this doesn't help. – bames53 Oct 10 '12 at 16:12
0

String literals are converted by the compiler from the source encoding to the execution encoding. The execution encoding you're using evidently can't handle that character so it's replaced with '?'.

You need to either choose a different execution encoding if your compiler supports that (gcc does with the flag -fexec-charset) or trick compilers that don't support that (such as Visual Studio) into not doing this conversion by lying to it about what the source encoding is.

You can lie to VS about the source encoding by setting your source code to UTF-8 without a signature. VS will assume the source encoding is the system's "encoding for non-Unicode programs" which is the same as it uses for the execution encoding. Since it will believe that the encodings are the same it will not perform any conversion and the string literal will be UTF-8. You'll have to be careful to avoid anything else in your source code where the compiler needs to know the correct encoding though. For example if you do this then wide string literals will not be converted correctly.

Another solution would be the new C++11 UTF-8 string literals: u8"Hello I’ve to go". These are converted by the compiler from the source encoding to UTF-8, rather than to the execution encoding. Unfortunately Visual Studio does not yet support UTF-8 string literals.


In a comment above you say "In my real project, this value is read from the file name." This indicates a completely different problem than the one demonstrated in your question. Solving this will require details about how exactly you get the file name.

Showing you how to fix the code that you posted will not fix your actual problem, because the problem in the code you posted and your actual problem are different. There will not be a 'generic solution' that solves both.

bames53
  • 86,085
  • 15
  • 179
  • 244
  • I am using QtCreator 4.6.3 for my development. – JChan Oct 10 '12 at 16:08
  • @JChan on what platform and using which compiler? QtCreator on Windows can use either MSVC or the mingw version of gcc. – bames53 Oct 10 '12 at 16:15
  • QTCreator on Windows 7 64 bit professional and mingw32 – JChan Oct 10 '12 at 16:20
  • @JChan Then I believe you can use the solution I suggested for gcc, setting the flag `-fexec-charset=utf-8` to make the code you posted in your question work. However, from the comment you posted indicating that your real code is different I believe you'll need to do something else to solve the problem in your real code. You may want to post a new question containing a minimal example of what your real code is doing. – bames53 Oct 10 '12 at 16:25
  • "In my real project, this value is read from the file name. I tried the similar implementation in my project, but it didnt work -- I commented this because of dowhilefor post regarding the hardcode. – JChan Oct 10 '12 at 16:27
  • @JChan Your code uses hard-coded data but your comment indicates that your real problem has nothing to do with hard-coded data. If you want your problem with a runtime string solved you should post code that does not used hard-coded data. – bames53 Oct 10 '12 at 16:35
0

test.txt (utf-8)

Hello I’ve to go.

Here is another test.

main.cpp

#include <QtCore>

int main(int argc, char *argv[]) {
  QCoreApplication a(argc, argv);
  QString unicode = QString::fromUtf16(L"Hello I\u2019ve to go");
  qDebug() << "Unicode String: " << unicode;

  QFile in_file("test.txt");

  if (!in_file.open(QIODevice::ReadOnly | QIODevice::Text)) {
    return -1;
  }

  QTextStream in(&in_file);
  while(!in.atEnd()) {
    QString line = in.readLine();
    qDebug() << line;
  }
}

Output:

Unicode String:  "Hello I’ve to go" 
"Hello I’ve to go." 
"Here is another test." 

It is working from a hard-coded value and a value read at run time. I suspect something is going bad at the point you are reading the file, i.e. you are using the wrong encoding or converting to Latin-1 encoding or something.

Dave Mateer
  • 17,608
  • 15
  • 96
  • 149
  • thanks for solution. In real scenario, the string input received will be atrun time, so replacing unicode characters with the correspodning value will be problem in my implementation. I am looking for a generic solution. – JChan Oct 10 '12 at 16:24