I want to extract the value of the "archivo" key of something like this:
...
<applet name="bla" code="Any.class" archive="Any.jar">
<param name="abc" value="space='1' archivo='bla.jpg'" </param>
<param name="def" value="space='2' archivo='bli.jpg'" </param>
<param name="jkl" value="space='3' archivo='blu.jpg'" </param>
</applet>
...
I suppose I need a list with [bla.jpg, bli.jpg, ...], so I try options like:
inputTag = soup.findAll("param",{'value':'archivo'})
or
inputTag = soup.findAll(attrs={"value" : "archivo"})
or
inputTag = soup.findAll("archivo")
and always I get an empty list: []
Other unsuccessful options:
inputTag = soup.findAll("param",{"value" : "archivo"}.contents)
I get something like: a dict object hasn't attribute contents
inputTag = unicode(getattr(soup.findAll('archivo'), 'string', ''))
I get nothing.
Finally I have seen: Difference between attrMap and attrs in beautifulSoup, and:
for tag in soup.recursiveChildGenerator():
print tag['archivo']
find nothing, it must be tag of name, code or archive keys.
and more finally:
tag.attrs = [(key,value) for key,value in tag.attrs if key == 'archivo']
but tag.attrs find nothing
OK, with jcollado's help I could get the list this way:
imageslist = []
patron = re.compile(r"archivo='([\w\./]+)'")
for tag in soup.findAll('param'):
if patron.search(tag['value']):
imageslist.append(patron.search(tag['value']).group(1))