3

I have a dictionary which I want to convert to multilevel column dataframe and the index will be the most outer keys of the dictionary.

my_dict = {'key1': {'sub-key1': {'sub-sub-key1':'a','sub-sub-key2':'b'}, 'sub-key2': {'sub-sub-key1':'aa','sub-sub-key2':'bb'}},
    'key2': {'sub-key1': {'sub-sub-key1':'c','sub-sub-key2':'d'}, 'sub-key2': {'sub-sub-key1':'cc','sub-sub-key2':'dd'}}}

My desired output should look like:

               sub-key1                        sub-key2
    sub-sub-key1    sub-sub-key2     sub-sub-key1    sub-sub-key2
key1    a               b                aa               bb
key2    c               d                cc               dd

I tried to use concat with pd.concat({k: pd.DataFrame.from_dict(my_dict, orient='index') for k, v in d.items()}, axis=1) but the result is not as expected.

I also tried to reform the dictionary.

reformed_dict = {}
for outerKey, innerDict in my_dict.items():
    for innerKey, values in innerDict.items():
        reformed_dict[(outerKey, innerKey)] = values
pd.DataFrame(reformed_dict)

Again the result was not ok. The highest level column and index are interchanged.

Is there any other way to do this?

EMT
  • 458
  • 3
  • 14
  • It looks like you need [pandas.MultiIndex.from_product](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.from_product.html). – Gonçalo Peres Aug 29 '22 at 14:17

2 Answers2

2

You were pretty close with concat, need to unstack after so like

res = pd.concat({k: pd.DataFrame.from_dict(v, orient='columns') 
                 for k, v in my_dict.items()}
         ).unstack()
print(res)
#          sub-key1                  sub-key2             
#      sub-sub-key1 sub-sub-key2 sub-sub-key1 sub-sub-key2
# key1            a            b           aa           bb
# key2            c            d           cc           dd
Ben.T
  • 29,160
  • 6
  • 32
  • 54
1

Try this one-liner that uses pd.concat, dict comphrension and pd.from_dict to format the dataframe, and pd.unstack to adjust the dataframe structure.

df = pd.concat({k: pd.DataFrame.from_dict(v) for k, v in my_dict.items()}).unstack()

result:

              sub-key1                        sub-key2
    sub-sub-key1    sub-sub-key2    sub-sub-key1    sub-sub-key2
key1    a                 b              aa                bb
key2    c                 d              cc                dd
Massifox
  • 4,369
  • 11
  • 31