Questions tagged [iterparse]

iterparse is used by XML parsers for tracking changes to the tree while it is being built

This tag is used in an XML parsing code. Usually iterparse builds a tree when parsing the XML. Also you can safely rearrange or remove parts of the tree while parsing.

See also:

83 questions
2
votes
2 answers

iterparse is throwing 'no element found: line 1, column 0' and I'm not sure why

I have a network application (using Twisted) that receives chunks of xml (as in the entire xml may not come in its entirety in a single packet) over the internet. My thought process is to slowly build the xml message as it's received. I've "settled"…
notorious.no
  • 4,919
  • 3
  • 20
  • 34
2
votes
1 answer

XML parser using iterparse 'loses' children

I appreciate your help on the following: I need to read a large XML file and convert it to CSV. I have two functions that are suppose to do the same, only that one (function1) uses iterparse (because I need to process about 2GB files) and another…
2
votes
2 answers

iterparse not getting children text

EDIT: I found a way to make it work. It turns out I had an elem.clear() call that I didn't show in the code below. I apologize for that. I modified it so you can see how it was. It turns out that if I move that call inside the if statement the…
eliasvc
  • 21
  • 4
2
votes
1 answer

Parsing same content twice with lxml.iterparse

I do not get why this works: content = urllib2.urlopen(url) context = etree.iterparse(content, tag='{my_ns}my_first_tag') context = iter(context) #for event, elem in context: # pass context = etree.iterparse(content,…
user3173237
  • 111
  • 2
  • 11
2
votes
1 answer

iterparse fails to parse a field, while other similar ones are fine

I use Python's iterparse to parse the XML result of a nessus scan (.nessus file). The parsing fails on unexpected records, wile similar ones have been parsed correctly. The general structure of the XML file is a lot of records like the one…
WoJ
  • 27,165
  • 48
  • 180
  • 345
1
vote
2 answers

Python iterparse large XML while filtering with elements and children

I am attempting to parse product data from icecat. The data comes in large xml files. (3-7gb). In order to reduce the amount of product data I am bringing in, I need to filter this list before moving to my next step. Particularly I need to filter by…
Steve
  • 588
  • 4
  • 17
1
vote
1 answer

Python LXML etree.iterparse. Check if current element complies with XPath

I would like to read quite big XML as a stream. But could not find any way to use my old XPathes to find elements. Previously files were of moderate size, so in was enough to: all_elements = [] for xpath in list_of_xpathes: …
Alexandr Crit
  • 111
  • 1
  • 14
1
vote
1 answer

How to iteratively parse a large XML file in Python?

I need to process an approximately 8Gb large .XML file. The file structure is (simplified) similar to the below: zzz ....and so on for thousands of rows …
Raits
  • 85
  • 9
1
vote
0 answers

Encoding error in LXML etree.iterparse() but not in etree.parse()

I am trying to parse a ~200MB XML file using LXML. I was stupidly doing etree.parse(xml_path), without any encoding parameter as argument, and then using iterwalk() to iterate over some child nodes, thinking that it would lower memory consumption.…
Kevin Doshi
  • 13
  • 1
  • 6
1
vote
1 answer

Is there a way to skip nodes/elements with iterparse lxml?

Is there a way using lxml iterparse to skip an element without checking the tag? Take this xml for example: text1 text2 text3 text4
Dan
  • 758
  • 6
  • 20
1
vote
1 answer

Python lxml iterparse sort by attribute large xml file

I have a large XML file which i'm trying to order the icons on for each programme, i want to order the icons descending by the value in the width attribute, i've managed to delete certain icons which are not needed but i'm unsure how i can order the…
Jamie B
  • 21
  • 2
1
vote
2 answers

iterparse elements getting cleared before I can capture the data

I'm trying to use Python to parse a large XML file (27GB) using cElementTree and iterparse. I'm able to extract all the tags, but for some reason none of the element text is being retrieved (its always showing 'None'). I've checked the documentation…
1
vote
1 answer

python lxml iterparse() is skipping first event

I am using iterparse() from python lxml to parse through a large XML file and get relevant data. This works perfectly fine, except for the first time an event occurs. The data for the first node is not captured. The same thing happens for when I…
kratzlos
  • 35
  • 1
  • 7
1
vote
1 answer

How to write with iterparse?

I am trying to loop through an XML document, find some tags, combine them into one new one and then write back to the xml doc using the ElementTree module in python. I have the code to the point where I believe it would work, but when i get to the…
Sam L
  • 162
  • 1
  • 8
1
vote
2 answers

Python tree.iterparse export source XML of selected element including all descendants

Python 3.4, parsing GB++ size XML Wikipedia dump files using etree.iterparse. I want to test within the current matched element for its value, depending on the latter value I then want export the source XML of the whole object and…
mwra
  • 317
  • 3
  • 11