0

I was trying to scrape some web contents. While the script runs fine in spyder python console (not ipython console), it throws an error when running in windows cmd line.

from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *
from bs4 import BeautifulSoup

class Render(QWebPage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
    self.loadFinished.connect(self._loadFinished)
    self.mainFrame().load(QUrl(url))
    self.app.exec_()

def _loadFinished(self, result):
    self.frame = self.mainFrame()
    self.app.quit()

url = "http://www.shfe.com.cn/en/MarketData/dataview.html?paramid=dailystock"
r = Render(url)
result = r.frame.toHtml()
soup = BeautifulSoup(result, 'lxml')

This is the error message

soup = BeautifulSoup(result, 'lxml')
File "C:\Anaconda2\lib\site-packages\bs4\__init__.py", line 225, in __init__
markup, from_encoding, exclude_encodings=exclude_encodings)):
File "C:\Anaconda2\lib\site-packages\bs4\builder\_lxml.py", line 118, in prepare_markup
for encoding in detector.encodings:
File "C:\Anaconda2\lib\site-packages\bs4\dammit.py", line 257, in encodings
self.markup, self.is_html)
File "C:\Anaconda2\lib\site-packages\bs4\dammit.py", line 319, in find_declared_encoding
declared_encoding = declared_encoding_match.groups()[0].decode(
AttributeError: 'QString' object has no attribute 'decode'

Python 2.7.13 |Anaconda custom (32-bit)| (default, Dec 19 2016, 13:36:02) [MSC v.1500 32 bit (Intel)] on win32

jf328
  • 6,841
  • 10
  • 58
  • 82
  • 1
    This may be helpful: [attributeerror-qstring-object-has-no-attribute-rfind](https://stackoverflow.com/questions/37263086/attributeerror-qstring-object-has-no-attribute-rfind) – t.m.adam Sep 14 '17 at 22:48
  • interesting... `BeautifulSoup(unicode(result), 'lxml')` worked! Thanks. – jf328 Sep 14 '17 at 23:08

0 Answers0