1

I am trying to use ElementTree with this sample data from Microsoft which I have just copied and paste into a string (perhaps naively).

I have input all of the XML data in a string as follows (this is a truncated example but I have used all the XML):

  data2 = '''
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
        etc 
        etc'''

Then used this code:

import xml.etree.ElementTree as ET    
tree2 = ET.fromstring(data2)
print (tree2.find('author').text)

And I get this output:

ParseError: XML or text declaration not at start of entity: line 2, column 0

However, when I try a simple example it works:

data = '''
<p>
  <name>Fred</name>
</p>'''

tree = ET.fromstring(data)
print (tree.find('name').text)

Out:

Fred

Is this because I have done a copy and paste or is my code incorrect? What am I doing wrong here?

halfer
  • 19,824
  • 17
  • 99
  • 186
nipy
  • 5,138
  • 5
  • 31
  • 72

5 Answers5

1
data2 = '''<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>'''

do not start with empty line.

宏杰李
  • 11,820
  • 2
  • 28
  • 35
1
import xml.etree.ElementTree as ET 

data2 = '''<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
   <book id="bk112">
      <author>Galos, Mike</author>
      <title>Visual Studio 7: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>49.95</price>
      <publish_date>2001-04-16</publish_date>
      <description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C++, C#, and ASP+ are 
      integrated into a comprehensive development 
      environment.</description>
   </book>
</catalog>'''

data2 = data2.strip()
root = ET.fromstring(data2)

for node in root.iter():
    print node.tag, node.text
nguaman
  • 925
  • 1
  • 9
  • 23
1

1 - first row mast be like " <?xml version="1.0"?> ", so first you strip(data2)

import xml.etree.ElementTree as ET  

data2 = '''
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
   </book>
   <book id="bk2">
      <author>Gambardella2, Matthew2</author>
   </book>
</catalog>
'''
data2 = data2.strip()

tree2 = ET.fromstring(data2)

for book in tree2.findall('book'):
     autor = book.find('author').text
     print (autor)
Danil.V
  • 301
  • 1
  • 5
0

Firstly, the <?xml version... tag needs to be at the very beginning of the string.

Your data has a newline character at the start, invalidating the format.

Bad:

data = '''
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
        etc 
        etc'''

assert data[0] == '\n'

Good:

import xml.etree.ElementTree as ET

data = '''<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
   </book>
</catalog>'''


catalog = ET.fromstring(data)
for book in catalog.getchildren():
    for author in book.getchildren():
        print(author.text)
Vasili Syrakis
  • 9,321
  • 1
  • 39
  • 56
-1

Remove <?xml version="1.0"?> from data2 with a replace.

There should be a way to specify these things but i didnt care at the time i stumbled upon that as i was parsing websites with very different understanding of what valid html looks like.

Harper04
  • 355
  • 2
  • 10