Questions tagged [iterparse]

iterparse is used by XML parsers for tracking changes to the tree while it is being built

This tag is used in an XML parsing code. Usually iterparse builds a tree when parsing the XML. Also you can safely rearrange or remove parts of the tree while parsing.

See also:

83 questions
0
votes
1 answer

Iterparse returns empty iterable when parsing xml with a default namespace

I'm parsing an xml document using iterparse. from lxml import etree import tempfile content = """ g
0
votes
1 answer

XMLSyntax error while using iterparse

I am parsing a large XML file in Python. The relevant part of the large XML file is as follows :
Dexter
  • 11,311
  • 11
  • 45
  • 61
0
votes
1 answer

best practices for iterparse usage while keeping the context?

Following a question I asked on iterparse general usage (and its answer by J F Sebastian) I will reorganise my code to parse nessus XML result files. Quoting from the earlier question, the file structure is
WoJ
  • 27,165
  • 48
  • 180
  • 345
0
votes
1 answer

GAE Python LXML - XMLSyntaxError Specification mandate value for attribute object

I am using Google App Engine on Python and am trying to fetch a GZipped XML file and parse it with LXML's iterparse. I used the example from lxml.de to create the following code: import gzip, base64, StringIO from lxml import etree from…
Vincent
  • 1,137
  • 18
  • 40
0
votes
2 answers

How to skip a node which raises an error when using cElementTree.iterparse()

I am trying to parse a very big XML file and do lower case and remove punctuation. The problem is that when I try to parse this file using the cET parse function for big files, at some point it comes across a bad formatted tag or character which…
user1262403
  • 31
  • 2
  • 4
0
votes
1 answer

Can't iterate over children's children because of the subsequent .clear()?

I'm trying to use the pattern described in the "event-driven parsing" section of the lxml tutorial. In my code I'm calling a function that can recursively run on elements using the iterchildren() method. I'll just use two nested loop for…
Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
0
votes
3 answers

Getting subelements using lxml and iterparse

I am trying to write a parsing algorithm to efficiently pull data from an xml document. I am currently rolling through the document based on elements and children, but would like to use iterparse instead. One issue is that I have a list of elements…
Sam Johnson
  • 943
  • 1
  • 13
  • 19
-1
votes
1 answer

OOM when using iterparse on huge XML dump file

Reading the large StackOverflow XML dump file (Posts.xml ~90 GB) through the following approach from xml.etree.cElementTree import iterparse for evt, elem in iterparse("Posts.xml", events=('end',)): if elem.tag == 'row': user_fields =…
Celso França
  • 653
  • 8
  • 31
1 2 3 4 5
6