-1

Suppose I have the following xml:

<?xml version="1.0" encoding="utf-8"?>
<FeedType xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="https://foo.com/bar" xsi:schemaLocation="https://foo.com/bar https://foo.com/bar/arr.xsd" value="Type">
    <ElementName value='Type'>
        <DataIWant>
            stuff
        </DataIWant>
        <DataIWant>
            other stuff
        </DataIWant>
    </ElementName>
</FeedType>

And I want to get everything in the ElementName tag.

In Beautifulsoup, one could call

soup.find_all('ElementName')

Which would return a tree with ElementName as the root.

How can I do this in lxml?

Dr. John A Zoidberg
  • 1,168
  • 2
  • 14
  • 25
  • 1
    lxml has a findall method... have you tried to use it yet? http://lxml.de/api/lxml.etree._Element-class.html#findall – kpie Jul 18 '16 at 08:47
  • using `root.findall('ElementName')` returns None. – Dr. John A Zoidberg Jul 18 '16 at 08:53
  • @shivsn If you try using the answer there, you'll find that it returns `None` or `[]` incorrectly for my xml. sample code: `xml = ('stuffother stuff')` `root = etree.fromstring(xml)` `print(root.findall("ElementName"))` – Dr. John A Zoidberg Jul 18 '16 at 08:57

1 Answers1

0

lxml has a findall method, which can be used.

However, the XML document contains a default namespace, and therefore searching for a plain ElementName tag won't find it - you need to specify the namespace:

root.findall('foobar:ElementName', namespaces = {'foobar': 'https://foo.com/bar'})

If you don't want to specify the namespace, you can use an XPath query that will ignore the namespace and just find elements whose "local name" is ElementName:

root.xpath("//*[local-name() = 'ElementName']")
Keith Hall
  • 15,362
  • 3
  • 53
  • 71
  • Is it possible to ignore the namespace, or have it be parsed automatically, like in BeautifuSoup? – Dr. John A Zoidberg Jul 18 '16 at 09:00
  • @Dr, you can ignore the namespace using XPath - please see my updated answer – Keith Hall Jul 18 '16 at 09:06
  • Is it possible to do something like `root.find_all('ElementName' , {'href' : 'stuff''})`? As in, select only ElementNames for which the href attribute is 'stuff' ? Sorry to be so demanding, but the lxml documentation is rather difficult to understand. – Dr. John A Zoidberg Jul 18 '16 at 09:56
  • Yes, XPath supports this - you can use `root.xpath("//*[local-name() = 'ElementName' and @href = 'stuff']")` the `lxml` documentation probably doesn't include an XPath tutorial, but you can research it independently of lxml :) – Keith Hall Jul 18 '16 at 09:59