0

I tried about about 15 answers from Stackoverflow, but nothing help me. I want convert any(!) nested XML file to CSV. I dont want write in programm direct elements of XML. On web exist many online services, there I can upload any xml and I can download CSV. I dont need say what concrete elements I wanted.

If something is repeted (classic examples - items) I want one item to one row with every data on headers and footers of documents. Maybe we can call Cartesian product. Nice example (what i want) show http://convertcsv.com/xml-to-csv.htm with YES for "Pivot data down instead of flattening" (Optional in Step 2)

My XML example:

<fav>
    <inv>
        <number>202101</number>
        <item>
            <q>50</q>
            <note>AAA</note>
            <more>999999999</more>
        </item>
        <adr>Bananos 15</adr>
        <item>
            <q>150</q>
            <note>BBB</note>
            <item_adr>Something...</item_adr>
        </item>
        <summary>
            <sum>500</sum>
        </summary>
    </inv>
    <inv>
        <number>202102</number>
        <item>
            <q>99950</q>
            <note>XXX</note>
            <item_adr3>Appleos 50</item_adr3>
        </item>
        <item>
            <q>150</q>
            <note>YYY</note>
        </item>
    </inv>
</fav>

What I want, this result:

number,item/0/q,item/0/note,item/0/more,adr,summary/sum,item/0/item_adr3
202101,50,AAA,999999999,Bananos 15,500,
202101,150,BBB,,Bananos 15,500,
202102,99950,XXX,,,,Appleos 50
202102,150,YYY,,,,

I tried work with xmltodict.

inputfiles = list_all_xml_files(os.getcwd())
for file in inputfiles:
    handle = open(file, "r", encoding='utf-8')
    content = handle.read()
    # https://github.com/martinblech/xmltodict
    dict = xmltodict.parse(content)

Result is:

OrderedDict([('fav', OrderedDict([('inv', [OrderedDict([('number', '202101'), ('item', [OrderedDict([('q', '50'), ('note', 'AAA'), ('more', '999999999')]), OrderedDict([('q', '150'), ('note', 'BBB'), ('item_adr', 'Something...')])]), ('adr', 'Bananos 15'), ('summary', OrderedDict([('sum', '500')]))]), OrderedDict([('number', '202102'), ('item', [OrderedDict([('q', '99950'), ('note', 'XXX'), ('item_adr3', 'Appleos 50')]), OrderedDict([('q', '150'), ('note', 'YYY')])])])])]))])

But, what next?

I request

  1. flatten the data
  2. create rows (with combinations for repeted of items (Cartesian product?))

Nothing what I found in Stackoverflow did not work correct for my example.

Can you help me? I hope, that I am not first person in Universum what solved it. Thank you very much...

  • One example does not make a specification. If you want to convert ANY xml to csv, then how would you like to represent this, for example: `any
    good ideas ?
    `.
    – Michael Kay Jul 22 '21 at 22:59
  • Like the service convertcsv.com with optional (describe above) ... you can try. – Ivan Koubek Jul 23 '21 at 09:50
  • So you want us to provide code that implements a specification obtained by reverse engineering the behaviour of an existing site? – Michael Kay Jul 23 '21 at 10:06
  • it will not be necessary Michael... Maybe it seems too simple to me... If some elements will be repeat - I want create new row into CSV. The site convertcsv.com I wrote only as fact that must exist some logic solution convert ANY XML to CSV without describe concrete elements. My question is, how to the same result with help strong Python. And your example? Its XML from web site. My case are bussines data XML (invoices, purch. orders etc) ... but it will be goog, when the script convert your example like CSV with 3 columns "a_hr", "a_id" and "a__text" with row with data ,5,any good ideas ?... – Ivan Koubek Jul 23 '21 at 15:47
  • You say "my cases are business data XML" but you started out by saying the solution had to handle ANY XML. If the requirement is only to handle XML without mixed content, or XML without attributes, or XML without comments, or XML without namespaces, then the problem becomes much more tractable. – Michael Kay Jul 23 '21 at 15:52
  • Yes my XML files are business data (mainly invoices, purch. orders etc). Namespace NO. Attributes NO. Comments NO. The content of XML is similar of my example in body of my question. But my main problem is how "chop" from Dictionary to rows for save to CSV elegantly. With headers (for example element ) and footers (for example element ) data for every repeated elements (). Excuse me Michael, that my infomration were not complet. I presumed that my open point is processing in Python of data from dictionary to rows mainly. – Ivan Koubek Jul 23 '21 at 20:18
  • I'd suggest that having clarified in your own mind what the requirement is, you start again from the beginning with a new question that explains more clearly what class of XML documents you want to process, and what algorithm you want to apply to them. – Michael Kay Jul 23 '21 at 20:55

0 Answers0