0

.findall() doesn't find anything when the document element has attributes. Why this behavior and how to solve it?

Here is the code:

from lxml import etree as et

text = '''\
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">\
<text class="what1 y2">abc</text><text class="what17 x1">nbc</text>\
<text class="f18 sf4 f12" textLength="72.18">ID_NUM.47</text></svg>'''

tree = et.fromstring(text)
for elem in tree.findall(".//text"):
  if elem.text == "ID_NUM.47":
    elem.getparent().remove(elem)
print(et.tostring(tree))

tree.findall(".//text") returns an empty list.

But with the following document, in which the svg tag attributes are removed, all elements are found :

text = '''\
<svg><text class="what1 y2">abc</text><text class="what17 x1">nbc</text>\
<text class="f18 sf4 f12" textLength="72.18">ID_NUM.47</text></svg>'''

Also, when replacing .findall() with .xpath('//*[attribute::textLength]') for example, all elements are found with both documents.

macxpat
  • 173
  • 2
  • 11

1 Answers1

2

These are namespace declaration attributes, so if you want to select an element in a namespace you need to take the namespace(s) into account, lxml allows it with e.g. for elem in tree.findall(".//text", namespaces={'':'http://www.w3.org/2000/svg'}).

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110