Create dataFrame from parent and child

Question

I'm working with data exported from Apple Health as an xml file.

workouts = [node for node in nodes if node.tag == 'Workout']
ET.dump(workouts[0])

Shows me there is a child element, WorkoutStatistics.

<Workout creationDate="2018-08-12 08:58:56 -0600" duration="7.757684383789698" durationUnit="min" endDate="2018-08-12 08:58:55 -0600" sourceName="Strongur" sourceVersion="9.2.0" startDate="2018-08-12 08:51:10 -0600" workoutActivityType="HKWorkoutActivityTypeTraditionalStrengthTraining">
  <WorkoutStatistics endDate="2018-08-12 08:58:55 -0600" startDate="2018-08-12 08:51:10 -0600" sum="30.7305" type="HKQuantityTypeIdentifierActiveEnergyBurned" unit="Cal" />
 </Workout>

I created the dataframe for the parent as follows...

workout_list = [x.attrib for x in root.iter('Workout')]

This provides the results from the parent data with the correct columns minus some formatting I'll do later. Similarly, I can put the child data into a df. However, I'm not sure how to also include the child WorkoutStatistics data in the dataframe.

How do I create a dataFrame that includes the child data and the parent data? Is there a key that matches to parent and child data such that it will only pull the data where parent and child match? Or can I use the startDate and endDate to match the records? If so, how do I do that in a merge? Thx!

Please provide some exemplary data. Also, you'll probably get better feedback if you show on this input data what output do you expect. — Pawel Kam, Mar 04 '23 at 00:46

Create dataFrame from parent and child

0 Answers0