2

I do not get why this works:

content = urllib2.urlopen(url)

context = etree.iterparse(content, tag='{my_ns}my_first_tag')
context = iter(context)
#for event, elem in context:
#     pass

context = etree.iterparse(content, tag='{my_ns}my_second_tag')
for event, elem in context:
     pass

where this doesn't work:

content = urllib2.urlopen(url)

context = etree.iterparse(content, tag='{my_ns}my_first_tag')
context = iter(context)
for event, elem in context:
     pass

context = etree.iterparse(content, tag='{my_ns}my_second_tag')
for event, elem in context:
     pass

and gives me this error:

XMLSyntaxError: Extra content at the end of the document, line 1, column 1

Can I not parse the same content twice? Strange that it is working when I just comment the loop and not the whole iterparse command.

Am I missing to close something?

Many thanks

user3173237
  • 111
  • 2
  • 11

1 Answers1

3

urllib2.urlopen gives you a file-like object that you can use to read the contents of the URL you're querying.

I'm guessing here that etree.iterparse returns an object that can be iterated but doesn't touch content at all until then. In that case, the first loop is using context to iterate over the contents of content, "consuming" the data as it goes.

When you create the second context, you're passing the same content, which is "empty" by then.

Edit: as you ask for ways to reparse... One would be to read out the whole data and then pass it separately to each iterparse call using StringIO as the file-like object. Eg.

from StringIO import StringIO

# ...

data = content.read()
context = etree.iterparse(StringIO(data), tag='{my_ns}my_first_tag')
# processing...
context = etree.iterparse(StringIO(data), tag='{my_ns}my_second_tag')
# processing...
Ricardo Cárdenes
  • 9,004
  • 1
  • 21
  • 34
  • Thanks Ricardo! Maybe you know how to reset content without doing urlopen again? The best would be to specify more than one tag in the iterparse command but I can not figure out how to do. Thanks again. – user3173237 Feb 18 '14 at 17:29