-1

can i get the numbers within the following HTML tag via beautifulsoup ?

<tr align="center" height="15" id="tr_1599656" bgcolor="#ffffff" index="0"></tr>
<tr align="center" height="15" id="tr_1599657" bgcolor="#ffffff" index="1"></tr>
<tr align="center" height="15" id="tr_1599644" bgcolor="#ffffff" index="2"></tr>

Python Code I've tried

from bs4 import BeautifulSoup
import re

html_code = """"
<tr align="center" height="15" id="tr_1599656" bgcolor="#ffffff" index="0"></tr>
<tr align="center" height="15" id="tr_1599657" bgcolor="#ffffff" index="1"></tr>
<tr align="center" height="15" id="tr_1599644" bgcolor="#ffffff" index="2"></tr>
"""
soup = BeautifulSoup(html_code,'html.parser')
rows = soup.findAll("tr", {"id" : re.compile('tr_*\d')})
print rows

Expected output

1599656
1599657
1599644
Mary
  • 769
  • 3
  • 14

2 Answers2

2
soup=BeautifulSoup('<tr align="center" height="15" id="tr_1599656" bgcolor="#ffffff" index="0"></tr><tr align="center" height="15" id="tr_1599657" bgcolor="#ffffff" index="1"></tr><tr align="center" height="15" id="tr_1599644" bgcolor="#ffffff" index="2"></tr>')

lines=soup.find_all('tr')

for line in lines:print(re.findall('\d+',line['id'])[0])

Please try once on your own next time. :)

1

Assuming all the id attribute follows the pattern tr_XXXXXXX. This code will work on it

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_code,'html.parser')
for t in soup.findAll('tr'):
    print(t['id'][3:])

Output

1599656
1599657
1599644

variable html_code contains the piece of html code you posted with your question

Hari Krishnan
  • 2,049
  • 2
  • 18
  • 29