0

I have the below xml that i would want to convert to csv ( delimiter like '|' or octal ) using python. I tried converting the xml into dict and then to csv . I am looking if there are any easy or efficient methods to do so.

Problems with the below code :

  1. There is a tag xyz , which isn't needed in csv , so how to escape or ignore that? All i want is the data from tag abc.
  2. Would want the nested tag as prefix to underlying keys
  3. Expected output is a line with keys as header and the values below it.

XML sample :

    <?xml version="1.0" encoding="utf-8"?>
    <xyz date="2019-07-01T09:00:29">
      <abc>
        <id>23</id>
        <uniqueid>23_0</uniqueid>
        <Name></Name>
        <Rate>
          <mrp>6.40000</mrp>
          <discount>10.00%</discount>
          <discountmonths>2</discountmonths>
        </Rate>
        <fee>
          <type>off</type>
          <minimumfee>£1,500.75</minimumfee>
          <maxfee>£10K</maxfee>
        </fee>
      </abc>
      <abc>
        <id>35</id>
        <uniqueid>35_0</uniqueid>
        <Name></Name>
        <Rate>
          <mrp>7.90000</mrp>
          <discount>5.00%</discount>
          <discountmonths>5</discountmonths>
        </Rate>
        <fee>
          <type>offer</type>
          <minimumfee>£1k</minimumfee>
          <maxfee>£22,000</maxfee>
        </fee>
      </abc>
    </xyz>

Code : ( don't mind the immaturity in the code, just a beginner , mostly a copy paste from multiple sites.. trying to understand if this works and then thought i will know where to start or modify the existing )

    import xml.etree.ElementTree as ET
    import xmltodict
    import csv
    tree = ET.parse('myxml_1.xml')
    xml_data = tree.getroot()
    xmlstr = ET.tostring(xml_data, encoding='utf-8', method='xml')
    data_dict = dict(xmltodict.parse(xmlstr))

    with open('test1.csv','w') as f:
        w = csv.writer(f)
        w.writerow(data_dict.keys())
        w.writerow(data_dict.values())

output expected :

    id|uniqueid|Name|rate_mrp|rate_discount|rate_discountmonths|fee_type|fee_minimumfee|fee_maxfee
    23|23_0||6.40000|10.00%|2|off|£1,500.75|£10K
    35|35_0||7.90000|5.00%|5|offer|£1k|£22,000
raven
  • 17
  • 3

1 Answers1

1

I'd do this in a very explicit way rather than trying to hack xmltodict to fit your needs.

The only downside I see with this approach is a bit of repetition with the hardcoded headers and tags names.

Also, I don't know how regular you input XML is going to be. If it's possible that some of the tags will not be present then you will need to add some error handling (because node.find will return None, then .text will cause an AttributeError).

rows = []
for abc_node in tree.findall('abc'):
    rate_node = abc_node.find('Rate')
    fee_node = abc_node.find('fee')
    row = {'id': abc_node.find('id').text,
           'uniqueid': abc_node.find('uniqueid').text,
           'Name': abc_node.find('Name').text,
           'rate_mrp': rate_node.find('mrp').text,
           'rate_discount': rate_node.find('discount').text,
           'rate_discountmonths': rate_node.find('discountmonths').text,
           'fee_type': fee_node.find('type').text,
           'fee_minimumfee': fee_node.find('minimumfee').text,
           'fee_maxfee': fee_node.find('maxfee').text}
    rows.append(row)

with open('test.csv', 'w', encoding='utf8') as f:
    headers = ['id', 'uniqueid', 'Name', 'rate_mrp', 'rate_discount', 'rate_discountmonths',
               'fee_type', 'fee_minimumfee', 'fee_maxfee']
    dict_writer = csv.DictWriter(f, fieldnames=headers, lineterminator='\n')
    dict_writer.writeheader()
    dict_writer.writerows(rows)

Output

id,uniqueid,Name,rate_mrp,rate_discount,rate_discountmonths,fee_type,fee_minimumfee,fee_maxfee
23,23_0,,6.40000,10.00%,2,off,"£1,500.75",£10K
35,35_0,,7.90000,5.00%,5,offer,£1k,"£22,000" 

If you want | as delimiter just add delimiter='|' to csv.DictWriter(f, fieldnames=headers, lineterminator='\n')

then the output is

id|uniqueid|Name|rate_mrp|rate_discount|rate_discountmonths|fee_type|fee_minimumfee|fee_maxfee
23|23_0||6.40000|10.00%|2|off|£1,500.75|£10K
35|35_0||7.90000|5.00%|5|offer|£1k|£22,000
DeepSpace
  • 78,697
  • 11
  • 109
  • 154
  • thanks alot for the above , i have got almost 300 tags.. so that would be a massive hardcoding here.. and also as per the .xsd all the attributes are not mandatory as you said.. – raven Jan 15 '20 at 12:22
  • do you think is there a library that could help us with the hardcoding part? and xsd to xml validation? – raven Jan 15 '20 at 12:22