I tried about about 15 answers from Stackoverflow, but nothing help me. I want convert any(!) nested XML file to CSV. I dont want write in programm direct elements of XML. On web exist many online services, there I can upload any xml and I can download CSV. I dont need say what concrete elements I wanted.
If something is repeted (classic examples - items) I want one item to one row with every data on headers and footers of documents. Maybe we can call Cartesian product. Nice example (what i want) show http://convertcsv.com/xml-to-csv.htm with YES for "Pivot data down instead of flattening" (Optional in Step 2)
My XML example:
<fav>
<inv>
<number>202101</number>
<item>
<q>50</q>
<note>AAA</note>
<more>999999999</more>
</item>
<adr>Bananos 15</adr>
<item>
<q>150</q>
<note>BBB</note>
<item_adr>Something...</item_adr>
</item>
<summary>
<sum>500</sum>
</summary>
</inv>
<inv>
<number>202102</number>
<item>
<q>99950</q>
<note>XXX</note>
<item_adr3>Appleos 50</item_adr3>
</item>
<item>
<q>150</q>
<note>YYY</note>
</item>
</inv>
</fav>
What I want, this result:
number,item/0/q,item/0/note,item/0/more,adr,summary/sum,item/0/item_adr3
202101,50,AAA,999999999,Bananos 15,500,
202101,150,BBB,,Bananos 15,500,
202102,99950,XXX,,,,Appleos 50
202102,150,YYY,,,,
I tried work with xmltodict.
inputfiles = list_all_xml_files(os.getcwd())
for file in inputfiles:
handle = open(file, "r", encoding='utf-8')
content = handle.read()
# https://github.com/martinblech/xmltodict
dict = xmltodict.parse(content)
Result is:
OrderedDict([('fav', OrderedDict([('inv', [OrderedDict([('number', '202101'), ('item', [OrderedDict([('q', '50'), ('note', 'AAA'), ('more', '999999999')]), OrderedDict([('q', '150'), ('note', 'BBB'), ('item_adr', 'Something...')])]), ('adr', 'Bananos 15'), ('summary', OrderedDict([('sum', '500')]))]), OrderedDict([('number', '202102'), ('item', [OrderedDict([('q', '99950'), ('note', 'XXX'), ('item_adr3', 'Appleos 50')]), OrderedDict([('q', '150'), ('note', 'YYY')])])])])]))])
But, what next?
I request
- flatten the data
- create rows (with combinations for repeted of items (Cartesian product?))
Nothing what I found in Stackoverflow did not work correct for my example.
Can you help me? I hope, that I am not first person in Universum what solved it. Thank you very much...
good ideas ?`. – Michael Kay Jul 22 '21 at 22:59