0

Is there any way I could select all the <option>s in the following HTML form <select> into a python list, like so, ['a','b','c','d']?

<select name="sel">
   <option value="a">a</option>
   <option value="b">b</option>
   <option value="c">c</option>
   <option value="d">d</option>
</select>

Many thanks in advance.

martineau
  • 119,623
  • 25
  • 170
  • 301
DGT
  • 2,604
  • 13
  • 41
  • 60

2 Answers2

5
import re
text = '''<select name="sel">
   <option value="a">a</option>
   <option value="b">b</option>
   <option value="c">c</option>
   <option value="d">d</option>
</select>'''
pattern = re.compile(r'<option value="(?P<val>.*?)">(?P=val)</option>')
handy_list = pattern.findall(text)
print handy_list

will output

['a', 'b', 'c', 'd']

Disclaimer: Parsing HTML with regular expressions does not work in the general case.

nmichaels
  • 49,466
  • 12
  • 107
  • 135
3

You might want to look at BeautifulSoup if you want to parse other HTML data also

from BeautifulSoup import BeautifulSoup

text = '''<select name="sel">
   <option value="a">a</option>
   <option value="b">b</option>
   <option value="c">c</option>
   <option value="d">d</option>
</select>'''

soup = BeautifulSoup(text)

print [i.string for i in soup.findAll('option')]
Rafi
  • 805
  • 6
  • 12