1

I wrote code in PyQt4 that scrapes a website and its inner frames.

import sys, signal
from PyQt4 import QtGui, QtCore, QtWebKit

class Sp():
  def save(self, ok, frame=None):
    if frame is None:
      print ('main-frame')
      frame = self.webView.page().mainFrame()
    else:
      print('child-frame')
    print('URL: %s' % frame.baseUrl().toString())
    print('METADATA: %s' % frame.metaData())
    print('TAG: %s' % frame.documentElement().tagName())
    print('HTML: ' + frame.toHtml())
    print()

  def handleFrameCreated(self, frame):
    frame.loadFinished.connect(lambda: self.save(True, frame=frame))

  def main(self):
    self.webView = QtWebKit.QWebView()
    self.webView.page().frameCreated.connect(self.handleFrameCreated)
    self.webView.page().mainFrame().loadFinished.connect(self.save)
    self.webView.load(QtCore.QUrl("http://www.w3schools.com/tags/tryit.asp?filename=tryhtml_iframe_scrolling"))

signal.signal(signal.SIGINT, signal.SIG_DFL)
print('Press Crtl+C to quit\n')
app = QtGui.QApplication(sys.argv)
s = Sp()
s.main()
sys.exit(app.exec_())  

This code depends on creating an instance of QApplication and exiting it accordingly.
The problem with this is that QApplication must be created and exited in the main thread.
I don't have access to the main thread in the project that i'm developing.
Is it possible to avoid the error “QApplication was not created in main() thread” in some way?
Maybe by rewriting the code for it to work without QApplication or somehow make QApplication work without the main thread?
Edit: I can edit the main thread if it doesn't intervene with its flow of the execution of its code, for example app = QtGui.QApplication([]) wouldn't stop the flow but a function that hangs until some code in another thread would finish would be considered inapplicable.

yuval
  • 2,848
  • 4
  • 31
  • 51
  • Are you 100% sure that using `QWebView` is the way to go? Couldn't you avoid using that class altogether? It doesn't look like you are actually displaying any content so I don't get what's the point of using a GUI framework to do that. – Bakuriu Feb 29 '16 at 15:21
  • I am displaying the content `print('HTML: ' + frame.toHtml())`. – yuval Feb 29 '16 at 15:27
  • If you have any other suggestions for doing this then please show me – yuval Feb 29 '16 at 16:45
  • 1
    @yuval. The `QApplication` **must** be run in the main thread, and **all** GUI operations must also be done in the main thread. That is just the way Qt works, and there's nothing you can do about it. The simplest workaround would probably be to run the scraper in a separate *process*, and then use a local socket (or whatever) to send the output back to the main program. – ekhumoro Feb 29 '16 at 20:26
  • That is actually a very good idea! – yuval Mar 01 '16 at 04:50
  • I used the multiprocessing module in python to get around this. – SurpriseDog Apr 04 '22 at 21:05

0 Answers0