3

All my source file are UTF-8 converted.

All my files Im opening are UTF-8.

My application is opening UTF-8 coded file which contains translated text for 3 languages: English, Polish and Russian and is saving the data to a file into 3 separate encoded blocks: Windows-1250 (English), Windows-1250 (Polish) and Windows-1251 (Russian) - yes that's right Im mixing encoding type inside one file which is then used by third-party device which know how to handle that.

Iv got a test program which worked flawlessly under Qt4 and now it stopped working (text is saved as ????????) when I moved to Qt5:

  • test_encoding.cpp

    test_encoding::test_encoding(QWidget *parent) : QMainWindow(parent)
    {
      ui.setupUi(this);
    
      QString d;
      QFile f(QDir::currentPath() + "/input.txt");
      if( f.open( QIODevice::ReadOnly | QIODevice::Text ) )
      {
        d = f.readAll();
        f.close();
      }
    
      QFile ff(QDir::currentPath() + "/output.txt");
      if( ff.open( QIODevice::WriteOnly | QIODevice::Text ) )
      {
        QTextStream t(&ff);
        auto cutf8 = QTextCodec::codecForName("UTF-8");
        auto cw50 = QTextCodec::codecForName("windows-1250");
        auto cw51 = QTextCodec::codecForName("windows-1251");
    
            // ____Block 1
        t.setCodec(cutf8);
        t << d << "\r\n";
        t << cutf8->fromUnicode(d) << "\r\n";
        t.flush();
    
            // ____Block 2
        t.setCodec(cw50);
        t << d << "\r\n";
        t << cw50->fromUnicode(d) << "\r\n";
        t.flush();
    
            // ____Block 3
        t.setCodec(cw51);
        t << d << "\r\n";
        t << cw51->fromUnicode(d) << "\r\n";
        t.flush();
      }
      ff.close();
    
      QCoreApplication::quit();
    }
    
  • input.txt (UTF-8 without BOM)

Użytkownik niezalogowany

Not logged-in user

Не зарегистрированный

  • output.txt (multi code page blocks)

____Block 1:

Użytkownik niezalogowany

Not logged-in user

Не зарегистрированный

Użytkownik niezalogowany

Not logged-in user

Не зарегистрированный

____Block 2:

U࠹tkownik niezalogowany

Not logged-in user

?? ??????????????????

U?ytkownik niezalogowany

Not logged-in user

?? ??????????????????

____Block 3:

U࠹tkownik niezalogowany

Not logged-in user

?? ??????????????????

U?ytkownik niezalogowany

Not logged-in user

?? ??????????????????

It appears it is possible to save the text only to UTF-8 which is not suitable for me - i need to use code pages Windows-1251 and Windows-1250.

Is it possible in Qt5 to convert from UTF-8 to other code pages?

Iuliu
  • 4,001
  • 19
  • 31
killdaclick
  • 713
  • 1
  • 10
  • 19
  • I see you have multiple codepages in one file, which is, frankly speaking, nonsense. What happens if you write the 3 files separately? – Karol S Nov 07 '14 at 15:24
  • It is not nonsense. The code which converts the texts is part of the bigger compiler which prepapres code for various microcontrollers and DSP. It is very convenient to have all texts in one file which in fact is a C (*.c) tables declaration file - check the link http://pastebin.com/ck8bpifw. It can support any number of languages - if I would have several files then I would have to produce many ifdefs and correct the compiler. What happens? It works if I save it to separate files - but it WORKED in QT4 before the way I described: 1) setCodec() 2) flush() 3) setCodec...... etc.! – killdaclick Nov 07 '14 at 15:41
  • Where does the final output come from? If it's copy-pasted from an editor, then the editor cannot handle multiple encodings. I suggest that you check the file using a hex editor. – Karol S Nov 07 '14 at 16:29
  • Iv pasted the wrong one coded entirely with UTF-8 from editor. The proper one pasted from editor (for code page Windows-1250) is here : http://pastebin.com/rgd7Pfj7 and the source file: http://sendfile.pl/download.php?id=134515 . I made some tests and appears that for some reason QT5 cant switch code page using setCodec() on the fly. In QT4 everything was working allright. If I destroy QTextStream object and create it again for the same QIODevice and set the code page it works as in the QT4: http://pastebin.com/EpWV2geT – killdaclick Nov 07 '14 at 16:50
  • I checked the sources of QT5 and I found out that when the first QTextStream::flush() is triggerred (no matter if it is forced by user or by the system) QIcuCodec::convertToUnicode is executed and the converted initialized. After that program is checking if the converter is initialized if it is then the new codec is not loaded - line 5 http://pastebin.com/2dEcCyET When I forced executable by the debugger to jump into line 7 (even if it was initialized before) then the conversion works as in QT4 - setCodec() on the fly! – killdaclick Nov 07 '14 at 17:18

1 Answers1

5

There is a bug in Qt 5 which Iv reported to Qt: https://bugreports.qt.io/browse/QTBUG-42498

At the moment a workaround is to create a new QTextStream object every time you want to change the code page - after QTextStream::flush() has been executed it is NOT possible to change the code page with QTextStream::setCodec() - check description of the bug in the link above. The problem is in line 5 in the source of QIcuCodec::getConverter() - http://pastebin.com/2dEcCyET

So the code which does not work in Qt 5 (and did work in Qt 4.8.4) written this way:

QFile f;
QTextStream ts(&f);
ts.setCodec("Windows-1250");
ts << englishTranslationBlock();
ts << polishTranslationBlock();
ts.flush();
ts.setCodec("Windows-1251");
ts << russianTranslationBlock();
ts.flush();
f.close();

To work around the reported bug, the code must create a new QTextStream to allow the Codec to change. The code will work when written this way:

QFile f;
QTextStream* ts = new QTextStream(&f);
ts->setCodec("Windows-1250");
ts << englishTranslationBlock();
ts << polishTranslationBlock();
ts->flush();
delete ts;
ts = new QTextStream(&f);
ts->setCodec("Windows-1251");
ts << russianTranslationBlock();
ts->flush();
f.close();
Renat Zaripov
  • 282
  • 2
  • 11
killdaclick
  • 713
  • 1
  • 10
  • 19