1

I am trying to make a class that can take either a list of urls or a single url and render them.

In the list case it renders them all and makes available a dictionary containing all the htmls. This works fine.

In the single case it takes a url, renders it and makes the html available as an attribute, then quits. This works fine when I run it once, but when I try it 2 or more times it locks up when it calls app.exec_().

import sys
from PySide.QtCore import *
from PySide.QtGui import *
from PySide.QtWebKit import QWebPage, QWebFrame

class Renderer(QWebPage):
    def __init__(self):
        self.app = QCoreApplication.instance()
        if self.app == None:
            self.app = QApplication(sys.argv)
            self.app.Type(QApplication.Tty)
        QWebPage.__init__(self)
        self.pages = []

    def start(self, urls):
        #for lists
        try:
            self.loadFinished.disconnect()
        except Exception:
            pass
        self.loadFinished.connect(self.listFinished)
        self._urls = iter(urls)
        self.fetchNext()
        self.app.exec_()

    def fetchNext(self):
        #for lists
        try:
            url = next(self._urls)
        except StopIteration:
            return False
        else:
            self.mainFrame().load(QUrl(url))
        return True

    def listFinished(self):
        #for lists
        html = self.processCurrentPage()
        self.pages.append(html)
        if not self.fetchNext():
            self.app.quit()

    def processCurrentPage(self):
        url = self.mainFrame().url().toString()
        html = self.mainFrame().toHtml()
        return html

    def render(self, url):
        try:
            self.loadFinished.disconnect()
        except Exception:
            pass
        self.loadFinished.connect(self.singleFinished)
        self._url = url
        self.mainFrame().load(QUrl(url))
        self.app.exec_()

    def singleFinished(self):
        print "singleFinished"
        html = self.processCurrentPage()
        self._html = html
        self.app.quit()

Is what I'm trying to do possible? How can I fix this code so I can call render() multiple times? Should I just use the list-based version?

The same problem occurs when I try the list case and then the single case. I'm pretty sure it doesn't like me calling exec_() after quit(), but I haven't found any documentation on this.

GreySage
  • 1,153
  • 19
  • 39
  • In the constructor of your class you create an instance and close it in the render method, that happens with the first url, so it does not close the application. – eyllanesc Jan 25 '17 at 22:27
  • @eyllanesc So how can I return control back to the caller without using app.quit()? – GreySage Jan 25 '17 at 22:30
  • You could share the entire code because I can not reproduce it. – eyllanesc Jan 25 '17 at 22:33
  • I expanded the code with the list-case stuff, but it shouldn't make a difference. I get this error when calling `r = Renderer() r.render(url1) r.render(url2)` – GreySage Jan 25 '17 at 22:38
  • @ekhumoro What irony? I read that question, used your answer (even though it didn't work without doing some work on it), and now I have a completely different problem. I don't think you understand what the word 'duplicate' means. – GreySage Jan 25 '17 at 23:32
  • Except that it's not creating multiple application objects. That's why I used .instance() way up there in the constructor. If it created multiple application objects, then I would get the behavior described in the question that you seem to think solves everything (hint: it doesn't), which I don't. It is in fact very obvious that the code doesn't create multiple application objects, because the list case (which features a workflow that includes the constructor, which is the only place an application is created) works. So if you're not going to stay on-topic or be helpful, please don't post. – GreySage Jan 25 '17 at 23:52
  • @GreySage. I have removed my comments as they aren't constructive, and I don't want our exchanges to get out of hand. I see that you have now removed your comment from my answer, which is appreciated, as it was the only thing that really bothered me. Can I suggest that you also remove your last two comments from here? – ekhumoro Jan 26 '17 at 01:07

1 Answers1

0

Re-creating or re-using a QApplication object can sometimes cause problems, depending on the platform and/or the specific versions of PyQt/PySide in use.

I have therefore adapted the example code so that it uses a local event-loop rather than continually quitting and restarting the application event-loop. This local loop should time-out after thirty seconds if the page doesn't load. Note that the original example attempts to create a console application (but using the wrong syntax). However, a full GUI application is needed for rendering web-pages (and in fact trying to do otherwise simply dumps core on my system). The code was tested using Python2 and Python3 with PySide-1.2.4 and PyQt-4.12 on ArchLinux (running from a normal console).

import sys
from PySide.QtCore import *
from PySide.QtGui import *
from PySide.QtWebKit import QWebPage, QWebFrame
# from PyQt4.QtCore import *
# from PyQt4.QtGui import *
# from PyQt4.QtWebKit import QWebPage, QWebFrame

class Renderer(QWebPage):
    def __init__(self):
        self.app = QApplication.instance()
        if self.app is None:
            self.app = QApplication(sys.argv)
        super(Renderer, self).__init__()
        self.mainFrame().loadFinished.connect(self.handleLoadFinished)
        self.loop = QEventLoop()

    def render(self, urls):
        self._urls = iter(urls)
        self.fetchNext()

    def fetchNext(self):
        self.loop.exit(0)
        try:
            url = next(self._urls)
        except StopIteration:
            return False
        else:
            self.mainFrame().load(QUrl(url))
            timer = QTimer()
            timer.setSingleShot(True)
            timer.timeout.connect(lambda: self.loop.exit(1))
            timer.start(30000)
            if self.loop.exec_() == 1:
                print('url load timed out: %s' % url)
        return True

    def processCurrentPage(self):
        url = self.mainFrame().url().toString()
        html = self.mainFrame().toHtml()
        print('loaded: [%d bytes] %s' % (self.bytesReceived(), url))

    def handleLoadFinished(self):
        self.processCurrentPage()
        self.fetchNext()

if __name__ == '__main__':

    r = Renderer()
    r.render(['http://en.wikipedia.org/'])
    r.render(['http://stackoverflow.com/'])

Output:

$ python2 test.py
loaded: [863822 bytes] http://en.wikipedia.org/wiki/Main_Page
loaded: [1718852 bytes] http://stackoverflow.com/
ekhumoro
  • 115,249
  • 20
  • 229
  • 336