0

I'm web scraping using Beautiful Soup. So, I've HTML code which has 8 tables. I'm trying to extract the contents from these tables.

for row in soup('table')[4].tbody('tr'):
  tds = row('td')
  print tds[0].string, tds[1].string

it gives the error:

    for row in soup('table')[4].tbody('tr'):
TypeError: 'NoneType' object is not callable

I understand that probably soup('table')[4] is becoming a None Type. But I don't understand why similar code worked here but not in my case?

Community
  • 1
  • 1
claws
  • 52,236
  • 58
  • 146
  • 195

1 Answers1

4

There is no <tbody> tag in your actual HTML.

In your browser DOM, the <tbody> tag is often an automatic tag; it is inserted because the HTML specification states there should be one, but that doesn't mean your actual HTML source has that tag in it. BeautifulSoup does not add it for you.

Go straight for the <tr> tags:

for row in soup('table')[4]('tr'):
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Damn! Damn! Deceived by the browser – claws May 28 '13 at 07:58
  • 1
    The dev tools 'structure' window is not always the best source of information on the HTML source of a file. Doublecheck against the actual 'view source' result or download yourself and verify that. :-) – Martijn Pieters May 28 '13 at 08:02