I am trying to replicate the example from this tutorial, but using iterparse with elem.clear().
XML example:
<?xml version="1.0" encoding="UTF-8"?>
<scenario>
<world>
<region name="USA">
<AgSupplySector name="Corn" nocreate="1">
<AgSupplySubsector name="Corn_NelsonR" nocreate="1">
<AgProductionTechnology name="Corn_NelsonR" nocreate="1">
<period year="1975">
<Non-CO2 name="SO2_1_AWB">
<input-emissions>3.98749e-05</input-emissions>
<output-driver/>
<gdp-control name="GDP_control">
<max-reduction>60</max-reduction>
<steepness>3.5</steepness>
</gdp-control>
</Non-CO2>
<Non-CO2 name="NOx_AWB">
<input-emissions>0.000285263</input-emissions>
<output-driver/>
<gdp-control name="GDP_control">
<max-reduction>60</max-reduction>
<steepness>3.5</steepness>
</gdp-control>
</Non-CO2>
</period>
</AgProductionTechnology>
</AgSupplySubsector>
</AgSupplySector>
</region>
</world>
</scenario>
The output is expected like this:
I am trying to parse it using the following code:
import os
import xml.etree.cElementTree as etree
import codecs
import csv
PATH = 'D:\Book1'
FILENAME_BIO = 'Test.csv'
FILENAME_XML = 'all_aglu_emissions.xml'
ENCODING = "utf-8"
pathBIO = os.path.join(PATH, FILENAME_BIO)
pathXML = os.path.join(PATH, FILENAME_XML)
with codecs.open(pathBIO, "w", ENCODING) as bioFH:
bioWriter = csv.writer(bioFH, quoting=csv.QUOTE_MINIMAL)
bioWriter.writerow(['Year','Gas', 'Value','Technology','Crop','Country'])
for event, elem in etree.iterparse(pathXML, events=('start','end')):
if event == 'start' and elem.tag == 'region':
str1 = elem.attrib['name']
elif event == 'start' and elem.tag == 'AgSupplySector':
str2 = elem.attrib['name']
elif event == 'start' and elem.tag == 'AgProductionTechnology':
str3 = elem.attrib['name']
elif event == 'start' and elem.tag == 'period':
str4 = elem.attrib['year']
elif event == 'start' and elem.tag == 'Non-CO2':
str5 = elem.attrib['name']
elif event == 'end' and elem.tag == 'input-emissions':
for em in elem.iter('input-emissions'):
str6 = em.text
bioWriter.writerow([str4, str5, str6, str3, str2, str1])
elem.clear()
My issue(s) here is that I got more extra lines with empty fields for str6. Probably, I have nesting problem here. Please help.
Error example (0 fields appear):