0

is there any way to load url and to manipulate the page dom without rendering the page i like to do it problematically without showing the page it self in the browser

user63898
  • 29,839
  • 85
  • 272
  • 514

1 Answers1

4

I believe you should be able to load the web page using QNetworkAccessManager and manipulate its content using QTextDocument; below is a small example. Also you can use QWebPage class without showing the page contents. I also included it into the example below:

void MainWindow::on_pushButton_clicked()
{
    // load web page
    QNetworkAccessManager *manager = new QNetworkAccessManager(this);
    connect(manager, SIGNAL(finished(QNetworkReply*)), this, SLOT(replyFinished(QNetworkReply*)));
    manager->get(QNetworkRequest(QUrl("http://www.google.com")));
}

void MainWindow::replyFinished(QNetworkReply* reply)
{
    QByteArray content = reply->readAll();

    // process network reply using QTextDocument 
    QTextDocument page;
    page.setHtml(content);
    for (QTextBlock block = page.begin(); block != page.end(); block = block.next())
    {
        // do smth here
        qDebug() << block.text();
    }

    // process network reply using QWebPage    
    QWebPage webPage;
    webPage.mainFrame()->setHtml(content);

    QWebElement document = webPage.mainFrame()->documentElement();
    QWebElementCollection elements = document.findAll("element_name");

    foreach (QWebElement element, elements)
    {
        // do smth here
        qDebug() << element.toPlainText();
    }
}

hope this helps, regards

serge_gubenko
  • 20,186
  • 2
  • 61
  • 64
  • Bumping a year old answer ... How would you go about parsing this without QWebPage? To be more precise, what if this code was not in the main thread where you can't create QWebPage? – liliumdev Aug 01 '11 at 23:34