0

Could someone please explain to me why I get a blank return when running this code? I am simply trying to print the contents of an html tag using beautiful soup. Code is below.

Thanks

import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) 
from bs4 import BeautifulSoup
http = urllib3.PoolManager()

def stats():
 url = 'https://www.flashscore.com.au/football/usa/mls/results/' 
 response = http.request('GET', url)
 soup = BeautifulSoup(response.data,'lxml')
 right_table=soup.find('div',{'class':'fs-table tournament-page'})
 print(right_table.text)

stats()     
t.m.adam
  • 15,106
  • 3
  • 32
  • 52
  • The problem is that the webpage is client-side rendered, so the div initially has no content. Different technologies are required: https://stackoverflow.com/questions/2148493/scrape-html-generated-by-javascript-with-python – Nelson Yeung Aug 15 '17 at 14:10
  • The page uses JavaScript and a WebSocket to load and fetch and render the data. See my answer to https://stackoverflow.com/questions/45259232/scraping-google-finance-beautifulsoup/ for how to render it. – Dan-Dev Aug 15 '17 at 14:12
  • @Error-SyntacticalRemorse Done, thanks for the tip – Dave Ingram Aug 15 '17 at 16:31
  • @NelsonYeung Thanks for the help – Dave Ingram Aug 15 '17 at 16:32
  • @Dan-Dev Thanks man finally got it to work using PyQt5 which you provided – Dave Ingram Aug 15 '17 at 16:33
  • @Dan-Dev Do you know how I could alter the PyQt5 code to deal with multiple urls, it works perfectly for a single url but crashes if I use more. – Dave Ingram Aug 16 '17 at 15:51
  • @Dave Ingram sorry for the late response I was abroad with limited internet for the last 10 days. Please see my answer below. – Dan-Dev Aug 27 '17 at 18:42

1 Answers1

0

You can fetch and process multiple URLs using PyQT5 as you ask in your comment like this:

from PyQt5.QtGui import *
from PyQt5.QtCore import *
from PyQt5.QtWebKit import *
from PyQt5.QtWebKitWidgets import QWebPage
from PyQt5.QtWidgets import QApplication
import bs4 as bs
import sys


class Render(QWebPage):
    def __init__(self):
        super(Render, self).__init__()
        self.mainFrame().loadFinished.connect(self.handleLoadFinished)

    def start(self, urls):
        self._urls = iter(urls)
        self.fetchNext()

    def fetchNext(self):
        try:
            url = next(self._urls)
        except StopIteration:
            return False
        else:
            self.mainFrame().load(QUrl(url))
        return True

    def processCurrentPage(self):
        print (self.mainFrame().url().toString())
        result = self.mainFrame().toHtml()
        soup = bs.BeautifulSoup(result, 'lxml')
        right_table = soup.find('div', {'class': 'fs-table tournament-page'})
        print(right_table.text)

    def handleLoadFinished(self):
        self.processCurrentPage()
        if not self.fetchNext():
            app.quit()


if __name__ == '__main__':
    urls = ["https://www.flashscore.com.au/football/usa/mls/results/", "https://www.flashscore.com.au/football/usa/mls/fixtures/"]
    app = QApplication(sys.argv)
    renderer = Render()
    renderer.start(urls)
    sys.exit(app.exec_())
Dan-Dev
  • 8,957
  • 3
  • 38
  • 55