36

I'm using BeautifulSoup. I have to find any reference to the <div> tags with id like: post-#.

For example:

<div id="post-45">...</div>
<div id="post-334">...</div>

I have tried:

html = '<div id="post-45">...</div> <div id="post-334">...</div>'
soupHandler = BeautifulSoup(html)
print soupHandler.findAll('div', id='post-*')

How can I filter this?

daaawx
  • 3,273
  • 2
  • 17
  • 16
Max Frai
  • 61,946
  • 78
  • 197
  • 306

4 Answers4

82

You can pass a function to findAll:

>>> print soupHandler.findAll('div', id=lambda x: x and x.startswith('post-'))
[<div id="post-45">...</div>, <div id="post-334">...</div>]

Or a regular expression:

>>> print soupHandler.findAll('div', id=re.compile('^post-'))
[<div id="post-45">...</div>, <div id="post-334">...</div>]
jfs
  • 399,953
  • 195
  • 994
  • 1,670
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
7

Since he is asking to match "post-#somenumber#", it's better to precise with

import re
[...]
soupHandler.findAll('div', id=re.compile("^post-\d+"))
xiamx
  • 6,560
  • 5
  • 25
  • 32
2

This works for me:

from bs4 import BeautifulSoup
import re

html = '<div id="post-45">...</div> <div id="post-334">...</div>'
soupHandler = BeautifulSoup(html)

for match in soupHandler.find_all('div', id=re.compile("post-")):
    print match.get('id')

>>> 
post-45
post-334
1
soupHandler.findAll('div', id=re.compile("^post-$"))

looks right to me.

Auston
  • 480
  • 1
  • 6
  • 13