Python BeautifulSoup cannot find table ID

Question

I am running into some trouble scraping a table using BeautifulSoup. Here is my code

from urllib.request import urlopen
from bs4 import BeautifulSoup

site = "http://www.sports-reference.com/cbb/schools/clemson/2014.html"
page = urlopen(site)
soup = BeautifulSoup(page,"html.parser")

stats = soup.find('table',  id = 'totals')

In [78]: print(stats)
None

When I right click on the table to inspect the element the HTML looks as I'd expect, however when I view the source the only element with id = 'totals' is commented out. Is there a way to scrape a table from the commented source code?

I have referenced this post but can't seem to replicate their solution.

Here is a link to the webpage I am interested in. I'd like to scrape the table labeled "Totals" and store it as a data frame.

I am relatively new to Python, HTML, and web scraping. Any help would be greatly appreciated.

Thanks in advance.

Michael

Please update your answer to include the relevant parts of the HTML source you're trying to scrape from. We need the question to be self-contained. — Soviut, Jun 08 '17 at 00:19

score 1 · Accepted Answer · answered Jun 08 '17 at 04:01

Comments are string instances in BeautifulSoup. You can use BeautifulSoup's find method with a regular expression to find the particular string that you're after. Once you have the string, have BeautifulSoup parse that and there you go.

In other words,

import re
from urllib.request import urlopen
from bs4 import BeautifulSoup

site = "http://www.sports-reference.com/cbb/schools/clemson/2014.html"
page = urlopen(site)
soup = BeautifulSoup(page,"html.parser")

stats_html = soup.find(string=re.compile('id="totals"'))
stats_soup = BeautifulSoup(stats_html, "html.parser")

print(stats_soup.table.caption.text)

score 0 · Answer 2 · answered Jun 08 '17 at 06:18

0

You can do this:

from urllib2 import *
from bs4 import BeautifulSoup

site = "http://www.sports-reference.com/cbb/schools/clemson/2014.html"
page = urlopen(site)
soup = BeautifulSoup(page,"lxml")

stats = soup.findAll('div', id = 'all_totals')
print stats

Please inform me if I helped!

answered Jun 08 '17 at 06:18

Costis94

250
2
14

Your answer doesn't get access to the div with the id=totals. It gets the one with id=all_totals. Embedded within there is the div he's after, but it's still a comment and as a comment is pretty much unusable. Try actually printing just the div he's after with your solution. You'll need to work with comments to get what's asked for in this question. – clockwatcher Jun 08 '17 at 16:07

Python BeautifulSoup cannot find table ID

2 Answers2