I am parsing a XML file, downloaded from internet, using lxml
. It has a structure something similar to this:
<root>
<a>Some text in A node</a>
<b><c>Some text in C node</c>Some text in B node</b>
</root>
I want to print the text inside the nodes with the following piece of code:
from lxml import etree
doc = etree.parse('some.xml')
root = doc.getroot()
for ch in root:
print ch.text
Output
Some text in A node
None
This is not printing the text
for <B>
. Why? When I change the XML (shown below), text
first and then child nodes, I get the correct output. Is it something to do with the XML syntax or lxml
? Since I cannot control the XML because it is directly downloaded from the internet, I need a way to get the text as it is in the previous format.
<root>
<a>Some text in A node</a>
<b>Some text in B node<c>Some text in C node</c></b>
</root>
Output
Some text in A node
Some text in B node