13

i'm trying to get web-page data in string that than i could parse it. I didn't found any methods in qwebview, qurl and another. Could you help me? Linux, C++, Qt.

EDIT:

Thanks for help. Code is working, but some pages after downloading have broken charset. I tried something like this to repair it:

QNetworkRequest *request = new QNetworkRequest(QUrl("http://ru.wiktionary.org/wiki/bovo"));

request->setRawHeader( "User-Agent", "Mozilla/5.0 (X11; U; Linux i686 (x86_64); "
                       "en-US; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1" );
request->setRawHeader( "Accept-Charset", "win1251,utf-8;q=0.7,*;q=0.7" );
request->setRawHeader( "charset", "utf-8" );
request->setRawHeader( "Connection", "keep-alive" );

manager->get(*request);

Any results =(.

genesis
  • 50,477
  • 20
  • 96
  • 125
Max Frai
  • 61,946
  • 78
  • 197
  • 306

3 Answers3

26

Have you looked at QNetworkAccessManager? Here's a rough and ready sample illustrating usage:

class MyClass : public QObject
{
Q_OBJECT

public:
    MyClass();
    void fetch(); 

public slots:
    void replyFinished(QNetworkReply*);

private:
    QNetworkAccessManager* m_manager;
};


MyClass::MyClass()
{
    m_manager = new QNetworkAccessManager(this);

    connect(m_manager, SIGNAL(finished(QNetworkReply*)),
         this, SLOT(replyFinished(QNetworkReply*)));

}

void MyClass::fetch()
{
    m_manager->get(QNetworkRequest(QUrl("http://stackoverflow.com")));
}

void MyClass::replyFinished(QNetworkReply* pReply)
{

    QByteArray data=pReply->readAll();
    QString str(data);

    //process str any way you like!

}

In your in your handler for the finished signal you will be passed a QNetworkReply object, which you can read the response from as it inherits from QIODevice. A simple way to do this is just call readAll to get a QByteArray. You can construct a QString from that QByteArray and do whatever you want to do with it.

Azendale
  • 675
  • 1
  • 7
  • 17
Paul Dixon
  • 295,876
  • 54
  • 310
  • 348
  • Thanks for answering. But i got an error: Object::connect: No such slot MainWindow::replyFinished(QNetworkReply*) – Max Frai Jun 27 '09 at 16:32
  • you need to add a slot to the receiving class with the signature void replyFinished(QNetworkReply*) – Idan K Jun 27 '09 at 16:33
  • Sorry, i understood. But i don't know how to read data yet. Help me, please :) – Max Frai Jun 27 '09 at 16:37
  • 1
    inside your replyFinished slot call readAll() on the QNetworkReply argument, you'll get back a QByteArray. – Idan K Jun 27 '09 at 17:06
  • I try this: manager->get(QNetworkRequest(QUrl("http:/stackoverflow.com")))->readAll().constData(); It always returns empty string. Why? – Max Frai Jun 27 '09 at 18:07
  • It's no use reading from the QNetworkReply object until the QNetworkManager sends the replyFinished signal - if you're not familiar with signal and slot handling, look it up in the Qt manual. – Paul Dixon Jun 27 '09 at 18:21
  • This won't work in general because "QNetworkReply is a sequential-access QIODevice, which means that once data is read from the object, it no longer kept by the device. It is therefore the application's responsibility to keep this data if it needs to." Often you will find readAll() returns nothing because the content has already been read. – hoju Sep 28 '10 at 11:23
  • The code I posted only reads it once - you'll get a fresh QNetworkReply whenever that fetch method is called. – Paul Dixon Sep 28 '10 at 23:40
  • Yes it works in isolation, but not if you try combining with a QWebView to render the webpage: http://stackoverflow.com/questions/2968482/qt-jambi-accessing-the-content-of-qnetworkreply – hoju Sep 29 '10 at 00:12
  • Rather than "it won't work in general", is that it won't work if you're using the QNetworkAccessManager in conjunction with another class which might be trying to consume the data as it loads. That is of course correct, but my sample doesn't do that, and I believe the OP was only interested in obtaining the response to parse it, not rendering it in a QWebView. It's perfectly valid technique. However, it's been a year since I wrote much Qt code, there might be some more concise techniques in recent releases... – Paul Dixon Sep 29 '10 at 00:23
  • According to the documentation `QString str(data);` will not work as expected unless the response is in Latin 1 encoding (unless you set something else earlier). – Joey Apr 16 '12 at 16:45
2

Paul Dixon's answer is probably the best approach but Jesse's answer does touch something worth mentioning.

cURL -- or more precisely libcURL is a wonderfully powerful library. No need for executing shell scripts and parsing output, libCURL is available C,C++ and more languages than you can shake an URL at. It might be useful if you are doing some weird operation (like http POST over ssl?) that qt doesnt support.

C-o-r-E
  • 583
  • 7
  • 16
1

Have you looked into lynx, curl, or wget? In the past I have needed to grab and parse info from a website, sans db access, and if you are trying to get dynamically formatted data, I believe this would be the quickest way. I'm not a C guy, but I assume there is a way to run shell scripts and grab the data, or at least get the script running and grab the output from a file after writing to it. Worst case scenario, you could run a cron and check for a "finished" line at the end of the written file with C, but I doubt that will be necessary. I suppose it depends on what you're needing it for, but if you just want the output html of a page, something as east as a wget piped to awk or grep can work wonders.

Jesse
  • 10,370
  • 10
  • 62
  • 81