3

So I have the following XML document It is much longer:

<?xml version ="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE fmresultset PUBLIC "-//FMI//DTD fmresultset//EN" "http://localhost:16020/fmi/xml/fmresultset.dtd">
<fmresultset xmlns="http://www.filemaker.com/xml/fmresultset" version="1.0">
<error code="0">
</error>
<product build="11/11/2014" name="FileMaker Web Publishing Engine" version="13.0.5.518">
</product>

I use the following python to extract some of the tag names:

doc = etree.fromstring(resulttxt)
print( doc.attrib)
print(doc.tag)
print(doc[4][0][0].tag)
if(doc[4][0][0].tag == 'field'):
    print 'hi'

What I'm getting though is:

{'version': '1.0'}
{http://www.filemaker.com/xml/fmresultset}fmresultset
{http://www.filemaker.com/xml/fmresultset}field

The xmlns doesn't show up as an attribute of the root tag but it is there.

And it is placed in front of each tag name which makes it difficult to loop through and use conditionals. I want doc.tag just to show the tag and not the namespace and the tag.

This is day 1 for me using this. could anyone help out?

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
BostonMacOSX
  • 1,369
  • 2
  • 17
  • 38

1 Answers1

2

You need to handle namespaces, in your case an empty one:

from lxml import etree as ET

data = """<?xml version ="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE fmresultset PUBLIC "-//FMI//DTD fmresultset//EN" "http://localhost:16020/fmi/xml/fmresultset.dtd">
<fmresultset xmlns="http://www.filemaker.com/xml/fmresultset" version="1.0">
    <error code="0">
    </error>
    <product build="11/11/2014" name="FileMaker Web Publishing Engine" version="13.0.5.518">
    </product>
</fmresultset>
"""

namespaces = {
  "myns": "http://www.filemaker.com/xml/fmresultset"
}

tree = ET.fromstring(data)
print tree.find("myns:product", namespaces=namespaces).attrib.get("name")

Prints:

FileMaker Web Publishing Engine
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • 1
    But that doesn't really answer my question of how do I get the tag names minus the namespace value in the curly braces. Look at my "if" statement on the last line of the python...maybe that will be clearer then. – BostonMacOSX Apr 01 '15 at 20:21