0

The following example comes from a very good answer to an existing question. I would like to refine the question further: selecting attribute values from lxml

having this XML, I would like to have the nodes where a particula node attribute contains a string

<?xml version ="1.0" encoding="UTF-8"?>
    <level1>
      <level2 first_att='att1.fff.tre' second_att='foo'><name>A</name><age>8</age></level2>
      <level2 first_att='att2.ert.wer' second_att='bar'><name>B</name><age>9</age></level2>
      <level2 first_att='att2.fff.wer' second_att='bar'><name>C</name><age>10</age></level2>
      <level2 first_att='att2.ert.wer' second_att='bar'><name>D</name><age>11</age></level2>
    </level1>

One can access the attribute 'bar' with:

import lxml.etree as etree
tree = etree.parse("test_file.xml")
print tree.xpath("//level1/level2[@first_att='att1.fff.tre']/@second_att")[0]

What If I would like to get the nodes where first_att CONTAINS 'fff' anywhere? (first and third node in the example.

The ultimate purpose is to get a dictionary to populate a pandas dataframe.

[{'name':'A','age':8},{'name':'B','age':10}]

thanks

JFerro
  • 3,203
  • 7
  • 35
  • 88

1 Answers1

-1

If you want to examine not the whole attribute value, but just a part, try to replace predicate

[@first_att='att1.fff.tre']

with

[contains(@first_att, 'fff')]
JaSON
  • 4,843
  • 2
  • 8
  • 15