The following example comes from a very good answer to an existing question. I would like to refine the question further: selecting attribute values from lxml
having this XML, I would like to have the nodes where a particula node attribute contains a string
<?xml version ="1.0" encoding="UTF-8"?>
<level1>
<level2 first_att='att1.fff.tre' second_att='foo'><name>A</name><age>8</age></level2>
<level2 first_att='att2.ert.wer' second_att='bar'><name>B</name><age>9</age></level2>
<level2 first_att='att2.fff.wer' second_att='bar'><name>C</name><age>10</age></level2>
<level2 first_att='att2.ert.wer' second_att='bar'><name>D</name><age>11</age></level2>
</level1>
One can access the attribute 'bar' with:
import lxml.etree as etree
tree = etree.parse("test_file.xml")
print tree.xpath("//level1/level2[@first_att='att1.fff.tre']/@second_att")[0]
What If I would like to get the nodes where first_att CONTAINS 'fff' anywhere? (first and third node in the example.
The ultimate purpose is to get a dictionary to populate a pandas dataframe.
[{'name':'A','age':8},{'name':'B','age':10}]
thanks