Hey guys I'm trying to turn this legacy data from an excel file into a machine readable form for further use. Data looks like this (This is the result of an excel pivot in compact form, which was copy pasted and the original data source is lost): Hierarchical nested data. (csv at the end of this text). The expected result should be like this: Machine readable form.
The length of the number before the category name indicates the hierarchy level of the category. E.g is 1cat the sum of 11cat, 12cat, 13cat and so on.
What I tried to do was to shift the cells depending on the length of the cat number and create a multindex and then transpose or melt or unpivot. But I am really struggling with that one and I hope someone can help me.
I will share some code snippets but they are incomplete and are mere attempts:
df = pd.read_excel(...)
mycolumns = df.columns.tolist()
df["cat"] = df[mycolumns[0]].str.split(' ', n=1).str[0]
df["catlen"] = [len(nr) for nr in df[("cat",)] ]
df["fourdigit"] = df[df[("catlen",)] == 4]
digitsmap = DataFrame(df[mycolumns[0]].str.split(' ', n=1).str[0])
digitsmap[index_name] = df[mycolumns[0]].str.split(' ', n=1).str[-1]
mycolumns = digitsmap.columns.tolist()
digitsmap["catlen"] = [len(nr) for nr in digitsmap[mycolumns[0]] ]
onedigit = digitsmap[digitsmap[("catlen",)] == 1]
twodigit = digitsmap[digitsmap[("catlen",)] == 2]
threedigit = digitsmap[digitsmap[("catlen",)] == 3]
fourdigit = digitsmap[digitsmap[("catlen",)] == 4]
CSV
Category;Value
1 Main Category;25,27
11 Subcategory;10,16
111 Subcategory a;4,34
112 Subcategory b;1,90
113 Subcategory c;
119 Subcategory i;3,92
12 Subcategory2;15,11
121 Subcategory2 a;9,84
1211 Subcategory2 aa;8,24
1212 Subcategory2 ab;0,11
1219 Subcategory2 ai;1,49
122 Subcategory2 b;5,27
Thanks guys!