0

I'm a novice in Python and am practicing web scraping by using BeautifulSoup.

I've checked some similar questions such as this one, this one, and this one. However, I'm still get stuck in my problem.

Here is my codes:

import urllib.request
from bs4 import BeautifulSoup

html = urllib.request.urlopen("https://en.wikipedia.org/wiki/List_of_largest_recorded_music_markets").read()
soup = BeautifulSoup(html, 'html.parser')

tbody = soup.find_all('table',{"class":"wikitable plainrowheaders sortable jquery-tablesorter"})

First, I don't think the web page I'm looking for contains java script that was mentioned in similar questions. I intend to extract the data in those tables, but when I executed print(tbody), I found it was an empty list. Can someone have a look and give me some hints?

Thank you.

Xiao Lu
  • 3
  • 1
  • The ```jquery-tablesorter``` class looks like it is added by javascript. Omit that from the ```class``` parameter. – Eric Truett Apr 30 '20 at 01:35
  • I see, sorry for my carelessness. By the way, is it roughly correct that javascript has a class started with "j"? – Xiao Lu Apr 30 '20 at 13:46

1 Answers1

0

You must remove the jquery-tablesorter part. It is dynamically applied after the page loads, so if you include it, it doesn't work.

This should work:

import urllib.request
from bs4 import BeautifulSoup

html = urllib.request.urlopen("https://en.wikipedia.org/wiki/List_of_largest_recorded_music_markets").read()
soup = BeautifulSoup(html, 'html.parser')

tbody = soup.find('table', {"class": "wikitable plainrowheaders sortable"})
print(tbody)
zenalc
  • 352
  • 3
  • 9