I have a really weird problem with lxml, I try to parse my xml file with iterparse as follow:
for event, elem in etree.iterparse(input_file, events=('start', 'end')):
if elem.tag == 'tuv' and event == 'start':
if elem.get('{http://www.w3.org/XML/1998/namespace}lang') == 'en':
if elem.find('seg') is not None:
write_in_some_file
elif elem.get('{http://www.w3.org/XML/1998/namespace}lang') == 'de':
if elem.find('seg') is not None:
write_in_some_file
It is pretty simple and works almost perfectly, shortly it goes through my xml file, if an elem is it checks if the language attribute is 'en' or 'de', it then checks if the got a child, if yes it writes its value into a file
There is ONE < seg > in the file that seems not existing, returning None when doing elem.find('seg'), you can see it here and you have it in its context below <seg>! keine Spalten und Ventile</seg>
.
I don't understand why this tag which seems perfectly fine creates a problem (since I can't use its .text), note that every other tag is find well
<tu tuid="235084307" datatype="Text">
<prop type="score">1.67647</prop>
<prop type="score-zipporah">0.6683</prop>
<prop type="score-bicleaner">0.7813</prop>
<prop type="lengthRatio">0.740740740741</prop>
<tuv xml:lang="en">
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34/7969ccc9b6/bevi-clean-ball.html</prop>
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34//bevi-clean-ball.html</prop>
<seg>! no gaps and valves</seg>
</tuv>
<tuv xml:lang="de">
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34/7969ccc9b6/bevi-clean-ball.html</prop>
<prop type="source-document">http://www.beviclean.de/en/shop/product-details/artikel/bevi-accessoires/34//bevi-clean-ball.html</prop>
<seg>! keine Spalten und Ventile</seg>
</tuv>
</tu>