0

I have the below code that reads a xml file and tries to convert it to csv. The below works fine, however when the data has one additional sub-level it throws an error child index out of range

Given below is the data set I am trying to work with:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<Document>
  <Customer>
    <CustomerCode>ABC</CustomerCode>
    <CustomerName>ABC Co</CustomerName>
    <CustomerBusinessHours>
        <CustomerBusinessHoursTimeZoneOffset>1.000000</CustomerBusinessHoursTimeZoneOffset>
    </CustomerBusinessHours>
  </Customer>
</Document>

Code that I have tried building:

import xml.etree.ElementTree as ET
import csv


tree = ET.parse("/users/desktop/sample.xml")
root = tree.getroot()

# open a file for writing

Resident_data = open('/users/desktop/file.csv', 'w')

# create the csv writer object

csvwriter = csv.writer(Resident_data)
resident_head = []

count = 0
for member in root.findall('Customer'):
    resident = []
    address_list = []
    if count == 0:
        CustomerCode = member.find('CustomerCode').tag
        resident_head.append(CustomerCode)
        CustomerName = member.find('CustomerName').tag
        resident_head.append(CustomerName)
        CustomerBusinessHours = member[3].tag
        resident_head.append(CustomerBusinessHours)
        csvwriter.writerow(resident_head)
        count = count + 1

    CustomerCode = member.find('CustomerCode').text
    resident.append(CustomerCode)
    CustomerName = member.find('CustomerName').text
    resident.append(CustomerName)
    CustomerBusinessHours = member[3][1].text
    address_list.append(CustomerBusinessHours)
    CustomerBusinessHoursTimeZoneOffset = member[3][2].text
    address_list.append(CustomerBusinessHoursTimeZoneOffset)
    csvwriter.writerow(resident)
Resident_data.close()

I get the below error:

CustomerBusinessHours = member[3][1].text
IndexError: child index out of range

Expected output:

CustomerCode,CustomerName,CustomerBusinessHoursTimeZoneOffset
ABC,ABC Co,1.000000
dark horse
  • 447
  • 1
  • 6
  • 17
  • Why do you assume `member` has at least four elements in the first place? What do you expect it to contain? – tripleee Feb 28 '19 at 13:59
  • @tripleee, I read over an article that used member as a variable and passed it in the loop.. I am basically trying to return back a csv output of the sample data I shared above. I am able to convert all of the data in the same level back to a csv file, however this one that has a sub-level is what is causing the error.. – dark horse Feb 28 '19 at 14:01
  • Also, your code has obvious indentation errors; could you please [edit]? – tripleee Feb 28 '19 at 14:01
  • I can only repeat my question. What do you expect `member` to contain at that point in time? Is it surprising to you that there are `member` instances with less than four elements? – tripleee Feb 28 '19 at 14:02
  • @tripleee, I have edited my initial post with the formatting and also the expected output given the sample data set.. Hope this helps. – dark horse Feb 28 '19 at 14:05
  • At the point you get a traceback, `member` does not have 4 members. Hint: it has 3, the last one of which has a subnode which contains the TZ offset. The indentation of the XML reinforces the hint. – tripleee Feb 28 '19 at 14:07
  • @tripleee, I tried modifying, CustomerBusinessHoursTimeZoneOffset = member[2][1].text but it still threw the index out of range error – dark horse Feb 28 '19 at 14:17
  • Do yourself a favor and *examine* the data you are trying to manipulate. Strong hint: You are making the same mistake again. There are not two subnodes. – tripleee Feb 28 '19 at 14:23
  • @darkhorse: Try this approach: [Iterparse XML, and get all, even nested, Sequence Elements](https://stackoverflow.com/a/53883799/7414759). Replace `tag=['entity']` with `tag=['Customer']` and remove the line `entity = {'id': ...`. The result is a `dict` which you can write with `csv.DictWriter(...`. – stovfl Feb 28 '19 at 14:49

1 Answers1

1

The code below is able to collect the data you are looking for.

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<Document>
  <Customer>
    <CustomerCode>ABC</CustomerCode>
    <CustomerName>ABC Co</CustomerName>
    <CustomerBusinessHours>
        <CustomerBusinessHoursTimeZoneOffset>1.000000</CustomerBusinessHoursTimeZoneOffset>
    </CustomerBusinessHours>
  </Customer>
</Document>'''

tree = ET.fromstring(xml)
for customer in tree.findall('Customer'):
    print(customer.find('CustomerCode').text)
    print(customer.find('CustomerName').text)
    print(customer.find('CustomerBusinessHours').find('CustomerBusinessHoursTimeZoneOffset').text)

Output

ABC
ABC Co
1.000000
balderman
  • 22,927
  • 7
  • 34
  • 52