0

I use PyQt4 in python 3.
I load a website's source code, including all the iframes of the website with this code:

import sys, signal, time
from PyQt4 import QtGui, QtCore, QtWebKit
class Sp():
  def save(self, ok, frame=None):
    if frame is None:
      print ('main-frame')
      frame = self.webView.page().mainFrame()
    else:
      print('child-frame')
    print('Time: ' + str(time.time() - startTime))
    print('URL: %s' % frame.baseUrl().toString())
    print('METADATA: %s' % frame.metaData())
    print('TAG: %s' % frame.documentElement().tagName())
    print('HTML: ' + frame.documentElement().toInnerXml())
    print()

  def handleFrameCreated(self, frame):
    frame.loadFinished.connect(lambda: self.save(True, frame=frame))

  def main(self):
    self.webView = QtWebKit.QWebView()
    self.webView.page().frameCreated.connect(self.handleFrameCreated)
    self.webView.page().mainFrame().loadFinished.connect(self.save)
    self.webView.load(QtCore.QUrl("http://10.0.0.101/default.htm"))

startTime = time.time()
signal.signal(signal.SIGINT, signal.SIG_DFL)
print('Press Crtl+C to quit\n')
app = QtGui.QApplication(sys.argv)
s = Sp()
s.main()
sys.exit(app.exec_())

This code gives me the source code of all iframes in a website.
I want to attach all those iframes into one html file.
Is this possible?

yuval
  • 2,848
  • 4
  • 31
  • 51
  • What does that mean? do you want to concatenate all of the HTML together? What purpose does it serve? – GLaDOS Feb 28 '16 at 12:16
  • Yes i want to concatenate all of the HTML together into one html file. The purpose is kind of complicated so i can't explain it. – yuval Feb 28 '16 at 12:29
  • @yuval. Maybe it's "complicated" because you're going about it in the wrong way? This is starting to look like a classic [XY Problem](http://xyproblem.info/). – ekhumoro Feb 28 '16 at 19:48

2 Answers2

1

I already partly answered this question in the comments to the answer you copied your code from. You cannot simply concatenate separate html pages into one page - the html forrmat just doesn't work like that.

There are tools that can save a complete webpage into a single file, but they all use a special format for doing it. One such format is MHTML, which is a proposed standard documented as RFC 2557. If you take a brief glance at it, you will see that it is far more complicated than simply gluing chunks of html together.

If you want to do this properly, I would suggest you look for a tool that has support for a format like MHTML.

Community
  • 1
  • 1
ekhumoro
  • 115,249
  • 20
  • 229
  • 336
0

If you're interested in just concatenating the HTML of all pages, you can add an attribute that will contain all of the html together, and get the html with :

class Sp():
    all_html = ''

    def save(self, ok, frame=None):
        if frame is None:
            print ('main-frame')
            frame = self.webView.page().mainFrame()
        else:
            print('child-frame')
        self.all_html += frame.toHtml()
        print('Time: ' + str(time.time() - startTime))
        print('URL: %s' % frame.baseUrl().toString())
        print('METADATA: %s' % frame.metaData())
        print('TAG: %s' % frame.documentElement().tagName())
        print('HTML: ' + frame.documentElement().toInnerXml())
        print()

Setting the HTML of the main webframe might not work because of many things, for instance cross domain policy. If you are interested though, you can use:

self.webView.page().mainFrame().setHtml(self.all_html)

or only set the HTML with parts of the frames.

GLaDOS
  • 620
  • 6
  • 17
  • No but I would like to place the frames in their relevant iframe tags, and not just concatenate the frames one after another – yuval Feb 28 '16 at 13:41
  • Please explain yourself, the frames are already placed in their relevant iframe tags. Also, I suggest you edit your question since it seems that you want something different from what you initially asked. – GLaDOS Feb 28 '16 at 13:42
  • @GLaDOS. He wants to replace the `iframes` with the html from the pages that they load - which obviously cannot possibly work in such a simplistic fashion. – ekhumoro Feb 28 '16 at 19:39