Am currently working on parsing XML in Python 3.x, for XML size till 300 MB not facing any issues with below code. However when file size increases to 500 MB or in GB, memory issues are being faced.
tree2=etree.parse(xmlfile2)
root2=tree2.getroot()
df_list2=[]
for i, child in enumerate(root2):
for subchildren in (child.findall('{raml20.xsd}header')):
for subchildren in (child.findall('{raml20.xsd}managedObject')):
xml_class_name2 = subchildren.get('class')
xml_dist_name2 = subchildren.get('distName')
for subchild in subchildren:
df_dict2=OrderedDict()
header2=subchild.attrib.get('name')
df_dict2['MOClass']=xml_class_name2
df_dict2['CellDN']=xml_dist_name2
df_dict2['Parameter']=header2
df_dict2['CurrentValue']=subchild.text
df_list2.append(df_dict2)
Came across various articles explaining use of 'iterparse', but am not getting a way through to use it for saving the XML data in ordered way. Below is format of my XML:
<raml version="2.0" xmlns="raml20.xsd">
<cmData type="plan" scope="all" name="XML_Plan_update.xml">
<header>
<log dateTime="2018-12-31T16:13:28" action="created" appInfo="PlanExporter"/>
</header>
<managedObject class="WNCEL" version="LN2.0" distName="PLMN-PLMN/MRBTS-137/WNBTS-1/WNCEL-27046" operation="update">
<p name="defaultCarrier">10787</p>
<p name="lCelwDN">MRBTS-137/MNL-1/MNLENT-1/CELLMAPPING-1/LCELW-4</p>
<p name="maxCarrierPower">460</p>
</managedObject>
<managedObject class="WNCEL" version="LN2.0" distName="PLMN-PLMN/MRBTS-6770/WNBTS-1/WNCEL-26925" operation="update">
<p name="defaultCarrier">10787</p>
<p name="lCelwDN">MRBTS-6770/MNL-1/MNLENT-1/CELLMAPPING-1/LCELW-5</p>
<p name="maxCarrierPower">460</p>
</managedObject>
<managedObject class="WNCEL" version="LN2.0" distName="PLMN-PLMN/MRBTS-806/WNBTS-1/WNCEL-22661" operation="update">
<p name="defaultCarrier">10762</p>
<p name="lCelwDN">MRBTS-806/MNL-1/MNLENT-1/CELLMAPPING-1/LCELW-9</p>
<p name="maxCarrierPower">460</p>
</managedObject>
Am currently using cElementTree or lxml to parse the XML and save the for loop generated output in Ordered Dictionary. All entries of dict are appended in list at the end. Looking for a way to use iterparse method for parsing above XML in ordered dict.