1

Im using QWebEnginePage to get content of some webpage. In the .pro file i have mentioned CONFIG-=gui.

However when i run the program in a headless system, it complains of being unable to connect to display:0. I also noticed that it requires libX11-xcb.so & related libs.

Is there anyway i can get the HTML of a page using QtWebEngine in headless mode without having to use xvfb?

Pragalathan M
  • 1,673
  • 1
  • 14
  • 19
  • I think there other alternatives to get the content of a webpage in Qt. – Redanium Feb 14 '17 at 19:51
  • @Redanium My requirement is to get the html of an ajax site. So I was looking for a headless browser to execute the javascript to generate the HTML. Please suggest if there is any alternative – Pragalathan M Feb 15 '17 at 06:01
  • Why don't you use `QNetworkAccessManager` with `QNetworkReply` – Redanium Feb 15 '17 at 08:00
  • @Redanium as for as i understand these classes, dont execute javascript and ajax call in the web page – Pragalathan M Feb 18 '17 at 17:01
  • have a look at this[Qtwebengine How do I run web engine on a sever without a display?](http://lists.qt-project.org/pipermail/qtwebengine/2015-December/000267.html) and this [-SOLVED- QtWebEngine headless?](https://forum.qt.io/topic/50228/solved-qtwebengine-headless/3) – Redanium Feb 21 '17 at 20:25
  • @Redanium, looks like "-platform offscreen" seems to work. I will test and confirm. thanks – Pragalathan M Mar 07 '17 at 12:13
  • For some reason i had to switch to Qtwebkit and it has some issue with "-patform offscreen". I need to test it further and update – Pragalathan M May 12 '17 at 17:50
  • Since webengine requires openGL it is not possible to run it without xcb & xvfb – Pragalathan M Jun 15 '17 at 12:25
  • What about selenium and PHANTOMJS ? – Redanium Dec 11 '17 at 16:59

2 Answers2

2

QtWebEngine is memory hungry compare to QtWebKit (single process version).

  • QtWebkit didnt render some sites properly.
  • QtWebEngine needed Xserver hence consumed more memory. Plus the multiple process design, even if you have a single tab

Finally switched to puppeteer. I know that this is not the direct answer to the question posted. But this solves the original problem of extracting DOM HTML of an ajax site in true headless mode.

Pragalathan M
  • 1,673
  • 1
  • 14
  • 19
1

"My requirement is to get the html of an ajax site. So I was looking for a headless browser to execute the javascript to generate the HTML. "

A spider may satisfy your requirement. With Scrapy and Chrome, you can do anything a browser can do.

kingsting
  • 11
  • 2