Parsing XML in python using minidom

Question

I have an XML as under;

<root>
 <entry>
    <accession>A</accession>
    <accession>B</accession>
    <accession>C</accession>
    <feature type="cross-link" description="sumo2">
        <location>
            <position position="15111992"/>
        </location>
    </feature>
    <feature type="temp" description="blah blah sumo">
        <location>
            <position position="12345"/>
        </location>
    </feature>
</entry>
<entry>
  <accession>X</accession>
    <accession>Y</accession>
    <accession>Z</accession>
    <feature type="test" description="testing">
        <location>
            <position position="1"/>
        </location>
    </feature>
    <feature type="cross-link" description="sumo hello">
        <location>
            <position position="11223344"/>
        </location>
    </feature>
 </entry>
</root>

I need to fetch the value of posiiton attribute whose feature type is "cross-link" and description contains the word sumo. This is what I have tried so far which correctly gives me those value whose feature type is "cross-link" and description contains the word sumo.

from xml.dom import minidom
xmldoc = minidom.parse('P38398.xml')
itemlist = xmldoc.getElementsByTagName('feature')

for s in itemlist:
    feattype = s.attributes['type'].value
    description = s.attributes['description'].value
    if "SUMO" in description:
        if "cross-link" in feattype:
            print feattype+","+description

How can I extract the value of position once I have the feature type as "cross-link" and description containing the word "sumo"?

guidot · Accepted Answer · 2017-04-25T12:11:20.550

0

You are nearly there except two points:

You have to change your "sumo" search pattern to lowercase to match the data given above

You then need to add something like the following to your loop body

posList = s.getElementsByTagName('position')
for p in posList:
    print "-- position is {}".format(p.attributes['position'].value)

edited Apr 25 '17 at 12:11

answered Apr 25 '17 at 12:05

guidot

5,095
2
25
37

miken32 · Answer 2 · 2017-09-12T23:27:32.457

0

This is a job for XPath. A simple check for attribute matches and substring matches and then we return the attribute as a string.

from lxml import etree
root = etree.parse('P38398.xml').getroot()
xpquery = '//feature[@type="cross-link" and contains(@description, "sumo")]//position/@position'
for att in root.xpath(xpquery):
    print(att)

edited Sep 12 '17 at 23:27

answered Apr 25 '17 at 22:12

miken32

42,008
16
111
154

Parsing XML in python using minidom

2 Answers2