0

Here is link where i got a code for web-page content fetching. But i have a trouble: i got text in wrong encoding. Could i correct it? Thanks.

EDIT: I'm trying to get data from page:

http://ru.wiktionary.org/wiki/example

And got: alt text http://img44.imageshack.us/img44/6141/kfastwikimainwindow.png

EDIT2: I just save all data to the html-file and show it in QWebView.

Community
  • 1
  • 1
Max Frai
  • 61,946
  • 78
  • 197
  • 306

2 Answers2

2

I think you're getting it with the correct encoding, it's just not being displayed with the correct encoding. I did a quick test and that's pretty much what it looks like when I display it with the Visual Studio HTML Visualizer, but if I save the data to file and open it with a browser, it is encoded correctly.

Gerald
  • 23,011
  • 10
  • 73
  • 102
  • Browser uses auto-encoding. So there is normal view. – Max Frai Jul 03 '09 at 20:00
  • 1
    Maybe I'm not understanding your question. QNetworkAccessManager returns the data as a raw byte array, it doesn't do anything with the encoding. The data is being returned with UTF-8 encoding, what you do with it is up to you. – Gerald Jul 03 '09 at 20:15
  • Maybe... There is some troubles with pasting data to the TextEdit.. I should test... Thanks for reply. – Max Frai Jul 03 '09 at 20:25
  • The reason the browser knows that it's utf-8 is because of the Content-Type meta tag. The HTML Visualizer isn't that sophisticated, it's only meant to render simple HTML, not figure out the encoding from the meta tags. I don't think there's anything you can do to change that behavior. – Gerald Jul 03 '09 at 20:26
  • And if i'll write data to the file and the open it in my application using QWebView? What do you think. – Max Frai Jul 03 '09 at 20:38
0

From what I understand you retrieve the data as QByteArray, which by itself does not have (or know about) an encoding. Depending on how you pass the data for displaying it might get treated as local8bit, but the website you linked is utf-8. In this case you can pass it through a QTextCodec to detect and use the correct encoding (QTextCodec::codecForHtml() might be interesting here), or, if you are sure you'll always get websites as utf-8, use QString::fromUtf8().

bluebrother
  • 8,636
  • 1
  • 20
  • 21