0

For a program in Python I am looking for a way to find a specific text in an element of XML and to find out which node number it is.

This is the xml:

-<shortcut>
<label>33060</label>
<label2>Common Shortcut</label2>
</shortcut>

-<shortcut>
<label>Test</label>
</shortcut>

Of course I know it is probably node number 2 in here, but the xml file can be longer.

This are to ways I tried it, but I don't get it to work properly:

xmldoc = minidom.parse("/DATA.xml")
Shortcut = xmldoc.getElementsByTagName("shortcut")
Label = xmldoc.getElementsByTagName("label")
print xmldoc.getElementsByTagName("label")[12].firstChild.nodeValue (works)
for element in Label:
  if  element.getAttributeNode("label") == 'Test':
  # if element.getAttributeNode('label') == "Test":
    print "element found"
else:
    print "element not found"

for node in xmldoc.getElementsByTagName("label"):
    if node.nodeValue == "Test":
        print "element found"
else:
    print "element not found"
Helfenstein
  • 315
  • 1
  • 4
  • 13
  • What is your expected output for the given XML? – Sait Jul 07 '15 at 04:47
  • It should give Test. But this is a part of the xml. – Helfenstein Jul 07 '15 at 06:12
  • @Helfenstein, is `lxml` an option? Using **XPath** seems most reasonable to this solution, if not you need to iterate the node tree and check against the text attribute. I'm not familiar to `minidom` however. – Anzel Jul 07 '15 at 07:18
  • I would like not to use extra modules. It should be able to run from 1 program. Otherwise I guess elementree also would work. – Helfenstein Jul 07 '15 at 07:56

1 Answers1

1

This working example demonstrates one possible way to search element containing specific text using minidom module* :

from xml.dom.minidom import parseString

def getText(nodelist):
    rc = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc.append(node.data)
    return ''.join(rc)


xml = """<root>
<shortcut>
<label>33060</label>
<label2>Common Shortcut</label2>
</shortcut>
<shortcut>
<label>Test</label>
</shortcut>
</root>"""
xmldoc = parseString(xml)
labels = xmldoc.getElementsByTagName("label")
for label in labels:
    text = getText(label.childNodes)
    if text == "Test":
        print("node found : " + label.toprettyxml())
        break

output :

node found : <label>Test</label>

*) getText() function taken from minidom documentation page.

har07
  • 88,338
  • 12
  • 84
  • 137