0

Code:

from defusedxml import ElementTree as etree
s = b'<?xml version="1.0"?><GetQueueAttributesResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><GetQueueAttributesResult><Attribute><Name>ApproximateNumberOfMessages</Name><Value>2</Value></Attribute></GetQueueAttributesResult><ResponseMetadata><RequestId>xxxx</RequestId></ResponseMetadata></GetQueueAttributesResponse>'
print(etree.fromstring(s))

Expected Output:
Should show complete xml data (same as input), so that it can be parsed further.

Actual Output:
Shows only first line.

<Element '{http://queue.amazonaws.com/doc/2012-11-05/}GetQueueAttributesResponse' at 0x09B50720>

This is all the data it reads.
Because I tried functions like findall() and getchildren() on this output and it returns nothing further.

How to resolve this issue? OR If there is some alternative library for similar approach, please suggest.

Alternatively, if there is any library to directly convert such xml data to json/dict, that will be super helpful.
But, it should convert data to readable form, not something like xmltodict where it gives weird OrderedDicts.

Note: Whichever library is suggested needs to be secure also, not like xml which has vulnerabilities.

SmiP
  • 155
  • 2
  • 2
  • 16

2 Answers2

1
from defusedxml import ElementTree as etree
tree = etree.parse('file.xml')
root = tree.getroot()
# gives the below output
   <Element '{http://queue.amazonaws.com/doc/2012-11-05/}GetQueueAttributesResponse' at 0x1107c7b88>
root.findall('.//{http://queue.amazonaws.com/doc/2012-11-05/}Attribute')
# gives the below output
   [<Element '{http://queue.amazonaws.com/doc/2012-11-05/}Attribute' at 0x1107c7c28>]

but I had to save the xml as a file.

update for inline xml: works same as when the file is saved as a separate file.

s = b'<?xml version="1.0"?><GetQueueAttributesResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><GetQueueAttributesResult><Attribute><Name>ApproximateNumberOfMessages</Name><Value>2</Value></Attribute></GetQueueAttributesResult><ResponseMetadata><RequestId>xxxx</RequestId></ResponseMetadata></GetQueueAttributesResponse>'
etree.fromstring(s).findall('.//{http://queue.amazonaws.com/doc/2012-11-05/}Attribute')

Reference: Parse XML namespace with Element Tree findall

Arun Kamalanathan
  • 8,107
  • 4
  • 23
  • 39
0

Was able to form concise logic from above sample and references.

from defusedxml import ElementTree as ETree

def parse_xml(xml, tag):
    xml_tree = ETree.fromstring(xml)
    xml_tree_str = str(xml_tree)
    xpath = xml_tree_str[xml_tree_str.find("{"): xml_tree_str.find("}") + 1]
    return [
        {attr.tag[attr.tag.find("}") + 1 :]: attr.text for attr in element}
        for element in xml_tree.findall(f".//{xpath}{tag}")
    ]
from unittest import TestCase
class TestParseXML(TestCase):
    def test_parse_xml(self):
        xml = b"""<?xml version="1.0"?>
                            <XResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/">
                                <XResult>
                                    <XResultEntry>
                                        <Id>1</Id>
                                        <Name>one</Name>
                                    </XResultEntry>
                                    <XResultEntry>
                                        <Id>2</Id>
                                        <Name>two</Name>
                                    </XResultEntry>
                                </XResult>
                                <ResponseMetadata>
                                    <RequestId>testreqid</RequestId>
                                </ResponseMetadata>
                            </XResponse>"""
        data = parse_xml(xml, "XResultEntry")
        self.assertEqual(data, [{"Id": "1", "Name": "one"}, {"Id": "2", "Name": "two"}])
SmiP
  • 155
  • 2
  • 2
  • 16