-1

I am using slightly edited code from Daniel Rodriguez. I am attempting to get all NBA box score data from 2014. There are two prior parts to this code: the first grabs all of the team names and the second grabs all of the games for those teams with ESPN game id, date, home team, home score, away team and away score. These two portions worked just fine.

Then I attempt to run the portion that grabs all of the boxscore data for the games from game id. It works for a chunk of the games and then will stop on a game almost randomly and give the error:

AttributeError: 'NoneType' object has no attribute 'find_all'

I say randomly because I run the same code over and over and it never stops on the same box score. It errors out on a different box score every time.

Here is the code(the ** line is where the error happens):

import numpy as np
import pandas as pd
import requests
import time
from bs4 import BeautifulSoup
import os
os.chdir('C:\Users\steven2r\Documents\Python')

games = pd.read_csv('games.csv').set_index('id')
BASE_URL = 'http://espn.go.com/nba/boxscore?gameId={0}'

request = requests.get(BASE_URL.format(games.index[0]))

table = BeautifulSoup(request.text).find('table', class_='mod-data')
heads = table.find_all('thead')
headers = heads[0].find_all('tr')[1].find_all('th')[1:]
headers = [th.text for th in headers]
columns = ['id', 'team', 'player'] + headers
bad_downloads = []

players = pd.DataFrame(columns=columns)

def get_players(players, team_name):
    array = np.zeros((len(players), len(headers)+1), dtype=object)
    array[:] = np.nan
    for i, player in enumerate(players):
        cols = player.find_all('td')
        array[i, 0] = cols[0].text.split(',')[0]
        for j in range(1, len(headers) + 1):
            if not cols[1].text.startswith('DNP'):
                array[i, j] = cols[j].text

    frame = pd.DataFrame(columns=columns)
    for x in array:
        line = np.concatenate(([index, team_name], x)).reshape(1,len(columns))
        new = pd.DataFrame(line, columns=frame.columns)
        frame = frame.append(new)
    return frame

for index, row in games.iterrows():
    print(index)
    request = requests.get(BASE_URL.format(index))
    table = BeautifulSoup(request.text).find('table', class_='mod-data')

    if table == []:
        print index, 'bad'
        bad_downloads.append(index)
    else:
        heads = table.find_all('thead')
        bodies = table.find_all('tbody')

        team_1 = heads[0].th.text
        team_1_players = bodies[0].find_all('tr') + bodies[1].find_all('tr')
        team_1_players = get_players(team_1_players, team_1)
        players = players.append(team_1_players)

        team_2 = heads[3].th.text
        team_2_players = bodies[3].find_all('tr') + bodies[4].find_all('tr')
        team_2_players = get_players(team_2_players, team_2)
        players = players.append(team_2_players)

players = players.set_index('id')
print(players)
players.to_csv('players.csv')

print bad_downloads
  • You're calling `find_all` on a string. Read the docs – L3viathan Feb 12 '15 at 20:01
  • find returns `None` not `[]` so if you wanted to explicitly catch when you get no match it would be `if table is None` but you have so many find_all's it is impossible to tell which one is causing an error – Padraic Cunningham Feb 12 '15 at 20:10
  • Thank you very much Padraic! I was trying to figure out what the value returned was. This works now. For some reason a handful of the box scores don't get loaded in correctly, and they're not the same ones every time! This way I can create a list of the game ID's that don't get pulled through and do them manually. No thanks to you L3viathan, you were unhelpful. – monkeyswithguns Feb 12 '15 at 20:52
  • @monkeyswithguns, look at the answer I just added here http://stackoverflow.com/questions/28447487/problems-parsing-nba-boxscore-data-with-beautifulsoup/28458296#28458296 – Padraic Cunningham Feb 12 '15 at 21:18

1 Answers1

0

See Problems Parsing NBA Boxscore Data with BeautifulSoup It appears BeautifulSoup isn't fully compatible with ESPN. The link above gives an alternate solution.

Community
  • 1
  • 1
Jonathan Epstein
  • 369
  • 2
  • 12
  • 2
    That question could well have been solved by using a different parser so not necessarily the same problem here. Also a slight change to the html and the regex will break – Padraic Cunningham Feb 12 '15 at 20:46