I want to remove the curly braces and XML namspace using lxml and just report the tag name

Question

So I have the following XML document It is much longer:

<?xml version ="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE fmresultset PUBLIC "-//FMI//DTD fmresultset//EN" "http://localhost:16020/fmi/xml/fmresultset.dtd">
<fmresultset xmlns="http://www.filemaker.com/xml/fmresultset" version="1.0">
<error code="0">
</error>
<product build="11/11/2014" name="FileMaker Web Publishing Engine" version="13.0.5.518">
</product>

I use the following python to extract some of the tag names:

doc = etree.fromstring(resulttxt)
print( doc.attrib)
print(doc.tag)
print(doc[4][0][0].tag)
if(doc[4][0][0].tag == 'field'):
    print 'hi'

What I'm getting though is:

{'version': '1.0'}
{http://www.filemaker.com/xml/fmresultset}fmresultset
{http://www.filemaker.com/xml/fmresultset}field

The xmlns doesn't show up as an attribute of the root tag but it is there.

And it is placed in front of each tag name which makes it difficult to loop through and use conditionals. I want doc.tag just to show the tag and not the namespace and the tag.

This is day 1 for me using this. could anyone help out?

The [tag:processing] tag should only be used for questions about the Processing language. — Kevin Workman, Apr 01 '15 at 18:51

score 2 · Answer 1 · edited May 23 '17 at 11:50

You need to handle namespaces, in your case an empty one:

from lxml import etree as ET

data = """<?xml version ="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE fmresultset PUBLIC "-//FMI//DTD fmresultset//EN" "http://localhost:16020/fmi/xml/fmresultset.dtd">
<fmresultset xmlns="http://www.filemaker.com/xml/fmresultset" version="1.0">
    <error code="0">
    </error>
    <product build="11/11/2014" name="FileMaker Web Publishing Engine" version="13.0.5.518">
    </product>
</fmresultset>
"""

namespaces = {
  "myns": "http://www.filemaker.com/xml/fmresultset"
}

tree = ET.fromstring(data)
print tree.find("myns:product", namespaces=namespaces).attrib.get("name")

Prints:

FileMaker Web Publishing Engine

But that doesn't really answer my question of how do I get the tag names minus the namespace value in the curly braces. Look at my "if" statement on the last line of the python...maybe that will be clearer then. — BostonMacOSX, Apr 01 '15 at 20:21

I want to remove the curly braces and XML namspace using lxml and just report the tag name

1 Answers1