4

From the url that is in the code, I am ultimately trying to gather all of the players names from the page. However, when I am using .findAll in order to get all of the list elements, I am yet to be successful. Please advise.

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

players_url = 'https://stats.nba.com/players/list/?Historic=Y'

# Opening up the Connection and grabbing the page
uClient = uReq(players_url)
page_html = uClient.read()

players_soup = soup(page_html, "html.parser")

# Taking all of the elements from the unordered lists that contains all of the players.

list_elements = players_soup.findAll('li', {'class': 'players-list__name'})
QHarr
  • 83,427
  • 12
  • 54
  • 101
datam
  • 255
  • 1
  • 3
  • 10
  • What was not found? – Alderven Mar 21 '19 at 05:51
  • 3
    The list of players generated in the page is done with javascript. You'll need a client that can fully render the page. Usually a common approach is to drive a browser to visit the url (you could use selenium for this), get the page source then feed that to beautiful soup. – Oluwafemi Sule Mar 21 '19 at 05:58

3 Answers3

4

As @Oluwafemi Sule suggested it is better to use selenium together with BS:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox()
driver.get('https://stats.nba.com/players/list/?Historic=Y')
soup = BeautifulSoup(driver.page_source, 'lxml')
for div in soup.findAll('li', {'class': 'players-list__name'}):
    print(div.find('a').contents[0])

Output:

Abdelnaby, Alaa
Abdul-Aziz, Zaid
Abdul-Jabbar, Kareem
Abdul-Rauf, Mahmoud
Abdul-Wahad, Tariq

etc.

Alderven
  • 7,569
  • 5
  • 26
  • 38
1

As @Oluwafemi Sule suggested) mentioned in the comment:

The list of players generated in the page is done with javascript.

Instead of using Selenium, I recommend you this package requests-html created by the author of very popular requests. It uses Chromium under the hood to render JavaScript content.

from requests_html import HTMLSession

session = HTMLSession()
r = session.get('https://stats.nba.com/players/list/?Historic=Y')
r.html.render()
for anchor in r.html.find('.players-list__name > a'):
    print(anchor.text)

Output:

Abdelnaby, Alaa
Abdul-Aziz, Zaid
Abdul-Jabbar, Kareem
Abdul-Rauf, Mahmoud
Abdul-Wahad, Tariq
...
radzak
  • 2,986
  • 1
  • 18
  • 27
1

You can do this with requests alone by pulling direct from the js script which provides the names.

import requests
import json

r = requests.get('https://stats.nba.com/js/data/ptsd/stats_ptsd.js')
s = r.text.replace('var stats_ptsd = ','').replace('};','}')
data = json.loads(s)['data']['players']
players = [item[1] for item in data]
print(players)
QHarr
  • 83,427
  • 12
  • 54
  • 101