I'm trying to convert a xml file into a pandas dataframe with the read_xml function. The problem is that one of the nested elements is repeated several times (but with different attributes), and it seems that pandas creates only one column for this element that gets written over every time until it gets to the last one.
Here is the contents of the xml file I used:
<?xml version="1.0" encoding="UTF-8"?>
<histdata>
<item>
<datetime>20-1-2023 00:00:00</datetime>
<value channel="Traffic Total" chid="-1">92 kbit/s</value>
<value channel="Traffic In" chid="0">77 kbit/s</value>
<value channel="Traffic Out" chid="1">16 kbit/s</value>
<value channel="Downtime" chid="-4">0 %</value>
<coverage>100 %</coverage>
</item>
<item>
<datetime>20-1-2023 00:05:00</datetime>
<value channel="Traffic Total" chid="-1">82 kbit/s</value>
<value channel="Traffic In" chid="0">727 kbit/s</value>
<value channel="Traffic Out" chid="1">18 kbit/s</value>
<value channel="Downtime" chid="-4">0 %</value>
<coverage>100 %</coverage>
</item>
</histdata>
simply reading the xml file with read_xml with:
import pandas as pd
df = pd.read_xml('chdata.xml')
df
gives: enter image description here
As you can see, there is only one column for 'value' where I would expect one column for every value element in the item tag of the xml file (maybe with the attribute in the column name to distinct them from the others).
I know I can retrieve the values of these elements with the following code:
chdata = pd.read_xml('chdata.xml', xpath='//value')
chdata
This gives me a dataframe containing all value elements, as shown here: enter image description here
What I actually need is one dataframe that shows the information from both previous df's in one table. The one 'value' column from 'df' should be replaced by four columns containing the four 'value' lines from each item in the xml. One of the attributes (channel name or id) could be used as a column name.
Preferably the table should look something like this:
datetime | chid (or name) from 1st value | same from 2nd | 3rd | 4th | coverage |
---|
Can anyone help me to achieve this? If it's possible with just the tools provided by python and pandas that would be nice, but any solution is welcome.