Can't figure out why soup.find_all() returns an empty list

Question

I'm a novice in Python and am practicing web scraping by using BeautifulSoup.

I've checked some similar questions such as this one, this one, and this one. However, I'm still get stuck in my problem.

Here is my codes:

import urllib.request
from bs4 import BeautifulSoup

html = urllib.request.urlopen("https://en.wikipedia.org/wiki/List_of_largest_recorded_music_markets").read()
soup = BeautifulSoup(html, 'html.parser')

tbody = soup.find_all('table',{"class":"wikitable plainrowheaders sortable jquery-tablesorter"})

First, I don't think the web page I'm looking for contains java script that was mentioned in similar questions. I intend to extract the data in those tables, but when I executed print(tbody), I found it was an empty list. Can someone have a look and give me some hints?

Thank you.

The ```jquery-tablesorter``` class looks like it is added by javascript. Omit that from the ```class``` parameter. — Eric Truett, Apr 30 '20 at 01:35
I see, sorry for my carelessness. By the way, is it roughly correct that javascript has a class started with "j"? — Xiao Lu, Apr 30 '20 at 13:46

score 0 · Accepted Answer · answered Apr 30 '20 at 01:39

0

You must remove the jquery-tablesorter part. It is dynamically applied after the page loads, so if you include it, it doesn't work.

This should work:

import urllib.request
from bs4 import BeautifulSoup

html = urllib.request.urlopen("https://en.wikipedia.org/wiki/List_of_largest_recorded_music_markets").read()
soup = BeautifulSoup(html, 'html.parser')

tbody = soup.find('table', {"class": "wikitable plainrowheaders sortable"})
print(tbody)

answered Apr 30 '20 at 01:39

zenalc

352
3
9

I got it. I removed the javascript part and that worked. Thank you! – Xiao Lu Apr 30 '20 at 13:46
If this helped you, can you please mark the answer as solved? Thank you – zenalc Apr 30 '20 at 15:37

Can't figure out why soup.find_all() returns an empty list

1 Answers1