I am trying to match port numbers in <span>
tags from an html page:
<span class="tbBottomLine" style="width:50px;">
8080
</span>
<span class = "tbBottomLine" style = "width: 50px;">
80
</ span>
<span class = "tbBottomLine" style = "width: 50px;">
3124
</ span>
<span class = "tbBottomLine" style = "width: 50px;">
1142
</ span>
Script:
import urllib2
import re
h = urllib2.urlopen('http://www.proxy360.cn/Region/Brazil')
html = h.read()
parser_port = '<span.*>\s*([0-9]){2,}\s*</span>'
p = re.compile(parser_port)
list_port = p.findall(html)
print list_port
But I'm getting this output:
['8', '8', '0', '0', '0', '8', '8', '0', '0', '8', '8', '8', '8', '8', '8', '8', '8', '0']
And I need it to match 8080
for example.