Web Scraping in Python - Understat.com

Question

I am trying to web scrape https://understat.com/league/EPL and scrape the table on the page with the list of teams and the data related to each, but it has not worked and I get an error 'NoneType' object is callable.

This is the code I tried (I am new to this and would appreciate your help):

from bs4 import beautifulsoup
import requests
import pandas as pd

result = requests.get ("https://understat.com/league/EPL")
src = result.content
soup = BeautifulSoup(src, 'html.parser')
columns = ("No", "Team", "M", "W", "D", "L", "G", "GA", "Pts", "xG", "xGA", "xPts")
df = pd.DataFrame(columns=columns)

for row in soup ("table", {"class":"calendar-container}).find_all("tr"):
    tds=row.find_all("td")
    row=(td.text.replace("\n", "") for td in tds)
    df = df.append(pd.Series(row, index=columns), ignore_index=True

You need to call a method on your soup object. `soup.find()` maybe? — NomadMonad, Apr 18 '20 at 12:13

score 0 · Answer 1 · answered Apr 18 '20 at 12:23

The page is being rendered by javascript. If you look at the source in your browser, you can see that all of the data is in javacript variables called datesData and playersdata. Take a look at this answer for two different approaches to getting javascript-rendered pages: Not able to extract nested table body with pandas from webpage.

Another option here is to use a regular expression on the variable and then parse it with the JSON module.

import json
import re

patt = re.compile(r'pattern for script variable you want')
src = result.content
match = re.search(patt, src)
if match:
    var = json.loads(match.group(1))

Web Scraping in Python - Understat.com

1 Answers1