38

I'm attempting to get a list of div ids from a page. When I print out the attributes, I get the ids listed.

for tag in soup.find_all(class_="bookmark blurb group") :
  print(tag.attrs)

results in:

{'id': 'bookmark_8199633', 'role': 'article', 'class': ['bookmark', 'blurb', 'group']}
{'id': 'bookmark_7744613', 'role': 'article', 'class': ['bookmark', 'blurb', 'group']}
{'id': 'bookmark_7338591', 'role': 'article', 'class': ['bookmark', 'blurb', 'group']}
{'id': 'bookmark_7338535', 'role': 'article', 'class': ['bookmark', 'blurb', 'group']}
{'id': 'bookmark_4530078', 'role': 'article', 'class': ['bookmark', 'blurb', 'group']}

So I know there ARE ids. However, when I print out tag.id instead, I just get a list of "None". What am I doing wrong here?

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
klreeher
  • 1,391
  • 2
  • 15
  • 27

2 Answers2

56

You can access tag’s attributes by treating the tag like a dictionary (documentation):

for tag in soup.find_all(class_="bookmark blurb group") :
    print tag.get('id')

The reason tag.id didn't work is that it is equivalent to tag.find('id'), which results into None since there is no id tag found (documentation).

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
2

This solution lists all tags with ids in a page , It might be helpful too.

tags = page_soup.find_all()
for tag in tags:
    if 'id' in tag.attrs:
        print(tag.name,tag['id'],sep='->')
Thunder
  • 10,366
  • 25
  • 84
  • 114