Parse XML with Python from clipboard

Question

I am trying to use ElementTree with this sample data from Microsoft which I have just copied and paste into a string (perhaps naively).

I have input all of the XML data in a string as follows (this is a truncated example but I have used all the XML):

  data2 = '''
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
        etc 
        etc'''

Then used this code:

import xml.etree.ElementTree as ET    
tree2 = ET.fromstring(data2)
print (tree2.find('author').text)

And I get this output:

ParseError: XML or text declaration not at start of entity: line 2, column 0

However, when I try a simple example it works:

data = '''
<p>
  <name>Fred</name>
</p>'''

tree = ET.fromstring(data)
print (tree.find('name').text)

Out:

Fred

Is this because I have done a copy and paste or is my code incorrect? What am I doing wrong here?

Duplicate, have a look [here](http://stackoverflow.com/a/36020709/7216865) — Maurice Meyer, Jan 13 '17 at 13:00

score 1 · Answer 1 · answered Jan 13 '17 at 13:10

1

data2 = '''<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>'''

do not start with empty line.

answered Jan 13 '17 at 13:10

宏杰李

11,820
2
28
35

score 1 · Answer 2 · answered Jan 13 '17 at 13:11

import xml.etree.ElementTree as ET 

data2 = '''<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
   <book id="bk112">
      <author>Galos, Mike</author>
      <title>Visual Studio 7: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>49.95</price>
      <publish_date>2001-04-16</publish_date>
      <description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C++, C#, and ASP+ are 
      integrated into a comprehensive development 
      environment.</description>
   </book>
</catalog>'''

data2 = data2.strip()
root = ET.fromstring(data2)

for node in root.iter():
    print node.tag, node.text

score 1 · Accepted Answer · answered Jan 13 '17 at 13:23

1 - first row mast be like " <?xml version="1.0"?> ", so first you strip(data2)

import xml.etree.ElementTree as ET  

data2 = '''
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
   </book>
   <book id="bk2">
      <author>Gambardella2, Matthew2</author>
   </book>
</catalog>
'''
data2 = data2.strip()

tree2 = ET.fromstring(data2)

for book in tree2.findall('book'):
     autor = book.find('author').text
     print (autor)

score 0 · Answer 4 · answered Jan 13 '17 at 13:10

Firstly, the <?xml version... tag needs to be at the very beginning of the string.

Your data has a newline character at the start, invalidating the format.

Bad:

data = '''
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
        etc 
        etc'''

assert data[0] == '\n'

Good:

import xml.etree.ElementTree as ET

data = '''<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
   </book>
</catalog>'''


catalog = ET.fromstring(data)
for book in catalog.getchildren():
    for author in book.getchildren():
        print(author.text)

Hi, thanks for that. This gives me the author but also everything after that. — nipy, Jan 13 '17 at 13:19

score -1 · Answer 5 · answered Jan 13 '17 at 12:58

-1

Remove <?xml version="1.0"?> from data2 with a replace.

There should be a way to specify these things but i didnt care at the time i stumbled upon that as i was parsing websites with very different understanding of what valid html looks like.

answered Jan 13 '17 at 12:58

Harper04

355
2
10

I now get `'NoneType' object has no attribute 'text'` – nipy Jan 13 '17 at 13:13

Parse XML with Python from clipboard

5 Answers5