I have the following code to parse from an xml file to produce a pandas dataframe. The XML file looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<Entries>
<EntrySynopsisDetail_1_0>
<EntryID>262148</EntryID>
<EntryTitle>Establishment of the Graduate Internship Program</EntryTitle>
<CategoryOfEntry>ENG</CategoryOfEntry>
</EntrySynopsisDetail_1_0>
<EntrySynopsisDetail_1_0>
<EntryID>2667654</EntryID>
<EntryTitle>Call for Mobility Program</EntryTitle>
<CategoryOfEntry>ENG</CategoryOfEntry>
<CategoryOfEntry>MAT</CategoryOfEntry>
</EntrySynopsisDetail_1_0>
</Entries>
And my code is below:
from bs4 import BeautifulSoup
import pandas as pd
fd = open("file_120123.xml",'r')
data = fd.read()
Bs_data = BeautifulSoup(data,'xml')
ID = Bs_data.find_all('EntryID')
Title = Bs_data.find_all('EntryTitle')
try:
Cat = Bs_data.find_all('CategoryOfEntry')
except IndexError:
Cat = ''
CatDict = {
"ENG":"English",
"MAT" :"Mathematics"
}
dataDf = []
for i in range(0,len(ID)):
if (Cat[i] == CatDict):
Cat[i] == CatDict.get(Cat[i])
rows = [ID[i].get_text(), Title[i].get_text(), Cat[i])
dataDf.append(rows)
df = pd.DataFrame(dataDf, columns =['ID', 'Title', 'Category'], dtype=float)
df.to_csv('120123.csv')
As you see, the code reads a xml file called 'file_120123.xml' using BeautifulSoup library, and calls each of the elements present in the file. Now one of the elements is a key and I have created a dictionary listing all possible keys. Not all parents have that element. I want to compare the extracted key with the ones in the dictionary and replace that with the value corresponding to that key.
With this code, I get the error IndexError: list index out of range
on Cat[i]
on if (Cat[i] == CatDict):
line. Any insights on how to resolve this?