4

How can I use a query element tree findall('Email') given the following xml?

<DocuSignEnvelopeInformation xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.docusign.net/API/3.0">
    <EnvelopeStatus>
        <RecipientStatus>
                <Type>Signer</Type>
                <Email>joe@gmail.com</Email>
                <UserName>Joe Shmoe</UserName>
                <RoutingOrder>1</RoutingOrder>
                <Sent>2015-05-04T09:58:01.947</Sent>
                <Delivered>2015-05-04T09:58:14.403</Delivered>
                <Signed>2015-05-04T09:58:29.473</Signed>
        </RecipientStatus>
    </EnvelopeStatus>
</DocuSignEnvelopeInformation>

I have a feeling it has to do with the namespace but I'm not sure. I looked at the docs and had no luck.

tree = <xml.etree.ElementTree.ElementTree object at 0x7f27a47c4fd0>
root = tree.getroot()
root
<Element '{http://www.docusign.net/API/3.0}DocuSignEnvelopeInformation' at 0x7f27a47b8a48>

root.findall('Email')
[]
user2954587
  • 4,661
  • 6
  • 43
  • 101

2 Answers2

6

You should read the docs more closely, in particular the section on Parsing XML with Namespaces, which includes an example that is almost exactly what you want.

But even without the docs, the answer is actually contained in your example output. When you printed the root element of your document...

>>> tree = etree.parse(open('data.xml'))
>>> root = tree.getroot()
>>> root
<Element {http://www.docusign.net/API/3.0}DocuSignEnvelopeInformation at 0x7f972cd079e0>

...you can see that it printed the root element name (DocuSignEnvelopeInformation) with a namespace prefix ({http://www.docusign.net/API/3.0}). You can use this same prefix as part of your argument to findall:

>>> root.findall('{http://www.docusign.net/API/3.0}Email')

But this by itself won't work, since this would only find Email elements that are immediate children of the root element. You need to provide an ElementPath expression to cause findall to perform a search of the entire document. This works:

>>> root.findall('.//{http://www.docusign.net/API/3.0}Email')
[<Element {http://www.docusign.net/API/3.0}Email at 0x7f972949a6c8>]

You can also perform a similar search using XPath and namespace prefixes, like this:

>>> root.xpath('//docusign:Email',
... namespaces={'docusign': 'http://www.docusign.net/API/3.0'})
[<Element {http://www.docusign.net/API/3.0}Email at 0x7f972949a6c8>]

This lets you use XML-like namespace: prefixes instead of the LXML namespace syntax.

larsks
  • 277,717
  • 41
  • 399
  • 399
  • Thank you, very helpful. I'd like to use your last example of `root.xpath` but it doesn't look like elementtree elements support .xpath `AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'xpath'` – user2954587 May 04 '15 at 21:34
  • 1
    This answer assumes you're using lxml.etree. – larsks May 04 '15 at 21:43
1

I got the namespaces option on a find to work successfully:


s='<DocuSignEnvelopeInformation xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.docusign.net/API/3.0">     <EnvelopeStatus>         <RecipientStatus>                 <Type>Signer</Type>                 <Email>joe@gmail.com</Email>                 <UserName>Joe Shmoe</UserName>                 <RoutingOrder>1</RoutingOrder>                 <Sent>2015-05-04T09:58:01.947</Sent>                 <Delivered>2015-05-04T09:58:14.403</Delivered>                 <Signed>2015-05-04T09:58:29.473</Signed>         </RecipientStatus>     </EnvelopeStatus> </DocuSignEnvelopeInformation>'

ET.register_namespace('xsd', 'http://www.w3.org/2001/XMLSchema')
ET.register_namespace('def', 'http://www.docusign.net/API/3.0')
ET.register_namespace('xsi', 'http://www.w3.org/2001/XMLSchema-instance')
nsmap = {'xsd':'http://www.w3.org/2001/XMLSchema','def': 'http://www.docusign.net/API/3.0','xsi':'http://www.w3.org/2001/XMLSchema-instance'}
e=ET.fromstring(s)
print (ET.tostring(e,'utf-8'))

print (nsmap)
print (e.findall('.//def:Email', namespaces = nsmap))
print (e.findall('.//{http://www.docusign.net/API/3.0}Email'))

gives the following:

b'<def:DocuSignEnvelopeInformation xmlns:def="http://www.docusign.net/API/3.0">     <def:EnvelopeStatus>         <def:RecipientStatus>                 <def:Type>Signer</def:Type>                 <def:Email>joe@gmail.com</def:Email>                 <def:UserName>Joe Shmoe</def:UserName>                 <def:RoutingOrder>1</def:RoutingOrder>                 <def:Sent>2015-05-04T09:58:01.947</def:Sent>                 <def:Delivered>2015-05-04T09:58:14.403</def:Delivered>                 <def:Signed>2015-05-04T09:58:29.473</def:Signed>         </def:RecipientStatus>     </def:EnvelopeStatus> </def:DocuSignEnvelopeInformation>'
{'xsd': 'http://www.w3.org/2001/XMLSchema', 'def': 'http://www.docusign.net/API/3.0', 'xsi': 'http://www.w3.org/2001/XMLSchema-instance'}
[<Element '{http://www.docusign.net/API/3.0}Email' at 0x7f828077a590>]
[<Element '{http://www.docusign.net/API/3.0}Email' at 0x7f828077a590>]