10

I have Debian Linux server that I use for a variety of things. I want it to be able to do some web-scraping jobs I need done regularly.

This code can be found here.

import sys  
from PyQt4.QtGui import *  
from PyQt4.QtCore import *  
from PyQt4.QtWebKit import *  

class Render(QWebPage):  
  def __init__(self, url):  
    self.app = QApplication(sys.argv, False)  # Line updated based on mata's answer
    QWebPage.__init__(self)  
    self.loadFinished.connect(self._loadFinished)  
    self.mainFrame().load(QUrl(url))  
    self.app.exec_()  

  def _loadFinished(self, result):  
    self.frame = self.mainFrame()  
    self.app.quit()  

A simple test of it would look like this:

url = 'http://example.com'
print Render(url).frame.toHtml()

On the call to the constructor it dies with this message (it's printed to stdout, not an uncaught exception).

: cannot connect to X server 

How can I use Python (2.7), QT4, and Webkit on a headless server? Nothing ever needs to be displayed, so I can tweek any settings or anything that need to be tweeked.

I've looked into alternatives, but this is the best fit for me and my projects. If I did have to install an X server, how could I do it with minimal overhead?

Brigand
  • 84,529
  • 20
  • 165
  • 173

5 Answers5

21

One of the constructors of QApplication takes a boolean argument GUIenabled.
If you use that, you can instantiante QAppliaction without an X server, but you can't create QWidgets.

So in this case the only option is to use a virtual X server like Xvfb to render the GUI.

Xvfb can be installed and run using these commands (assuming you have apt-get installed). The code in the original question is in a file called render.py.

sudo apt-get install xvfb
xvfb-run python render.py
mata
  • 67,110
  • 10
  • 163
  • 162
  • It gave me "QWidget: Cannot create a QWidget when no GUI is being used". Do you have an idea how to fix it? I'll check out Xvfb, just in case. – Brigand Nov 04 '12 at 01:39
  • Sorry, I didn't really check it and I somehow seemed to remember that you just can't show widgets in headless mode but instantiate them. So if you need to use Qt, you'll have to go with Xvfb. – mata Nov 04 '12 at 01:55
  • xvfb works great! I was worried I'd have to install all of X11, and have a server running. Thanks! I updated your answer with what worked for me. – Brigand Nov 04 '12 at 05:15
  • @mata Where did you read that the constructor for QApplication takes an argument GUIenabled? I can't find anything about that. – GreySage Jan 20 '17 at 23:26
  • @GreySage - I've updated the link. Note that this is only valid for PyQt4, on PyQt5 that argument is not supported anymore, probably because it doesn't make a lot of sense in the first place. Better to use QCoreApplication instead. – mata Jan 22 '17 at 10:44
  • Except that webkit relies on the gui application, even if you don't use any widgets it will complain if you try to use the QCoreApplication. – GreySage Jan 23 '17 at 16:39
  • @GreySage The GUIEnabled parameter doesn't help eitter in that situation, so you either need to use a virtual X server, or for PyQt5 you can try the `-platform minmal` approach as suggested below. – mata Jan 24 '17 at 00:30
  • Unfortunately it does not work too well when the application is moderately complex?: https://bitbucket.org/brainstorm/flatcam/addon/pipelines/home#!/results/17 – brainstorm May 31 '18 at 09:06
6

On gitlab CI/CD. Adding ['-platform', 'minimal'] and using xvfb didn't work for me. Instead I use QT_QPA_PLATFORM: "offscreen" variable.

See https://stackoverflow.com/a/55442821/6000005

azzamsa
  • 1,805
  • 2
  • 20
  • 28
  • This worked for me and seems to be the current solution. I have used `xvfb` in the past (years), which doesn't do it any more, but setting the platform target as described here does. – bossi May 05 '20 at 14:05
5

If PyQt5 is an option, Qt 5 has the "minimal" platform plugin.

To use it, modify the argv passed to QApplication to include ['-platform', 'minimal'].

Artur Gaspar
  • 4,407
  • 1
  • 26
  • 28
1

If all you are trying to do is get the webpage, you could use

import urllib
urllib.urlopen('http://example.com').read()
pydsigner
  • 2,779
  • 1
  • 20
  • 33
  • Good general answer, but I like to have the JavaScript. Thanks. – Brigand Nov 04 '12 at 02:28
  • Yes. HTML, CSS, JavaScript, images, etc. It's exactly like going to the site in Chrome or Safari (they both use WebKit). – Brigand Nov 04 '12 at 05:10
  • It seems I may have misunderstood what you were trying to do. Are you wanting to actually display the webpage? Your example led me to believe that you only wanted the HTML. – pydsigner Nov 06 '12 at 00:21
  • Python WebKit lets you do querys on the page (CSS2-like selectors), execute JavaScript, etc. You could do what I want with the HTML and BeuatifulSoup but I like the completeness. – Brigand Nov 06 '12 at 06:49
  • OK, then @mata's version is what you want. – pydsigner Nov 06 '12 at 18:16
  • 1
    The main limiter for BeautifulSoup is the fact that it ignores JavaScript, which is why the OP was lead to webkit, just like me I'm sure. – GreySage Jan 20 '17 at 21:58
1

phantomjs is a webkit based solution. runs headless as well. try it out.

If you are keen on using webkit yourself you could also try the pyslide version of qt.

mAsT3RpEE
  • 1,818
  • 1
  • 17
  • 14