1

I want to get data from a website which it's contents are loading dynamically with JQuery and update every 5 minutes.The website don't have any API, So i have to get it's HTML content and parse it to get what i need. The problem is that the source HTML of the website doese not have the content which is loaded dynamically. I want to do this in c++/Qt. is this possible?


I want to get content which is generated by jquery and it is not available in page source.

leonardo d
  • 21
  • 4
  • Possible duplicate of [How to parse HTML with C++/Qt?](https://stackoverflow.com/questions/18676800/how-to-parse-html-with-c-qt) – user9335240 Feb 20 '18 at 19:41
  • @user9335240 It's not a duplicate of that. i want to get the source HTML of content which is generated with jquery. – leonardo d Feb 20 '18 at 19:46
  • Just try to use [`QDomDocument`](http://doc.qt.io/qt-5/qdomdocument.html#details) on the result of [`QNetworkRequest`](http://doc.qt.io/qt-5/qnetworkrequest.html), [`QNetworkReply`](http://doc.qt.io/qt-5/qnetworkreply.html), [`QNetworkAccessManager`](http://doc.qt.io/qt-5/qnetworkaccessmanager.html) – user9335240 Feb 20 '18 at 19:53
  • @user9335240 No, `QDomDocument` cannot parse HTML. It's only for XML (and the documentation doen't even recommend using in the first place.) – MrEricSir Feb 20 '18 at 20:51
  • @MrEricSir is right, I tried `QDomDocument` and failed when I tried to parse "www.wikipedia.org". Refer to [this document](https://wiki.qt.io/Handling_HTML), it is old, but may be useful. It suggests some libraries to parse your HTML, (but you may be still need to read it using QNetworkAccessManager, QNetworkRequest, QNetworkReply) – user9335240 Feb 20 '18 at 23:15
  • @user9335240 is this work for jquery generated contents too? – leonardo d Feb 21 '18 at 00:41
  • You generate it dynamically using jQuery??? So your solution is to use `QtWebEngine`, then try to listen to the event of calling some javascript, or if an event for adding content is available. – user9335240 Feb 21 '18 at 00:53
  • @user9335240 Yes the website that i want to get it's content , generate the content dynamically using jquary. it doesnt have API. can ou explain more? – leonardo d Feb 21 '18 at 00:57
  • You can try to inject JavaScript scripts using [QWebEngineScript](http://doc.qt.io/qt-5/qwebenginescript.html), and `QWebEnginePage`. You for example can inject a script that listen on document change and calls a C++ function. https://doc.qt.io/qt-5.10/qwebenginepage.html#runJavaScript-2, calls a JavaScript, that when completion, something C++ happens, so you can try to run a script that gets the generated HTML (after jQuery does its work), and get the result in the C++ callback. – user9335240 Feb 21 '18 at 09:23
  • Also you say that the content is generated using jQuery and the site doesn't have API, how?, what is the source that jQuery gets the data from to put on the HTML? – user9335240 Feb 21 '18 at 09:24
  • @user9335240 it is not mine, and its owner doesn't provide any API for people to use. – leonardo d Feb 21 '18 at 19:23

0 Answers0