1

Suppose I have a nested dictionary of the format:

dictionary={
     "A":[1, 2],
     "B":[2, 3],
     "Coords":[{
        "X":[1,2,3],
        "Y":[1,2,3],
        "Z":[1,2,3],
     },{
        "X":[2,3],
        "Y":[2,3],
        "Z":[2,3],
     }]
 }

How can I turn this into a Pandas MultiIndex Dataframe?

Equivalently, how can I produce a Dataframe where the information in the row is not duplicated for every co-ordinate?

In what I imagine, the two rows of output DataFrame should appear as follows:

Index A   B   Coords 
---------------------
0      1   2   X Y Z
              1 1 1
              2 2 2
              3 3 3
--------------------
---------------------
1      2   3   X Y Z
              2 2 2
              3 3 3
--------------------
Jack Rolph
  • 587
  • 5
  • 15
  • what kind of object should `Coords` be in your dataframe? A mixture of different data types? – AloneTogether Jan 22 '22 at 10:53
  • Coords is a 2D list of floats, no more no less. Ideally, I want to be able to access the list in the same manner as a Dataframe, such that Coords itself behaves like or can be easily turned into a Dataframe itself. – Jack Rolph Jan 22 '22 at 10:56
  • So, then, Coords is a DataFrame, I suppose. What I am hoping for is that df.iloc[0]["A"] returns 1 and df.iloc[0]["X"].iloc[0] returns 1 – Jack Rolph Jan 22 '22 at 10:59

1 Answers1

3

From your dictionary :

>>> import pandas as pd

>>> df = pd.DataFrame.from_dict(dictionary)
>>> df
    A   B   Coords
0   1   2   {'X': [1, 2, 3], 'Y': [1, 2, 3], 'Z': [1, 2, 3]}
1   2   3   {'X': [2, 3], 'Y': [2, 3], 'Z': [2, 3]}

Then we can use pd.Series to extract the data in dict in the column Coords like so :

df_concat = pd.concat([df.drop(['Coords'], axis=1), df['Coords'].apply(pd.Series)], axis=1)
>>> df_concat
    A   B   X           Y           Z
0   1   2   [1, 2, 3]   [1, 2, 3]   [1, 2, 3]
1   2   3   [2, 3]      [2, 3]      [2, 3]

To finish we use the explode method to get the list as rows and set the index on columns A and B to get the expected result :

>>> df_concat.explode(['X', 'Y', 'Z']).reset_index().set_index(['index', 'A', 'B'])
                X   Y   Z
index   A   B           
0       1   2   1   1   1
            2   2   2   2
            2   3   3   3
1       2   3   2   2   2
            3   3   3   3

UPDATE :

If you are using a version of Pandas lower than 1.3.0, we can use the trick given by @MillerMrosek in this answer :

def explode(df, columns):
    df['tmp']=df.apply(lambda row: list(zip(*[row[_clm] for _clm in columns])), axis=1) 
    df=df.explode('tmp')
    df[columns]=pd.DataFrame(df['tmp'].tolist(), index=df.index)
    df.drop(columns='tmp', inplace=True)
    return df

explode(df_concat, ["X", "Y", "Z"]).reset_index().set_index(['index', 'A', 'B'])

Output :

                X   Y   Z
index   A   B           
0       1   2   1   1   1
            2   2   2   2
            2   3   3   3
1       2   3   2   2   2
            3   3   3   3
Jack Rolph
  • 587
  • 5
  • 15
tlentali
  • 3,407
  • 2
  • 14
  • 21
  • Hi there! Thank you very much! I have followed your method for my data and I get the following error: 'column must be a scalar'. I don't quite understand why this is happening. – Jack Rolph Jan 22 '22 at 12:25
  • I see, I am using a recent version of Pandas (>=1.3.0) that allow me to make an `explode` on several column. Is it ok for you to update for a newer version of Pandas (pip install pandas==1.3.5) or do you prefer to stay with your current version ? – tlentali Jan 22 '22 at 12:36
  • I just updated the answer to solve the problem if you prefer to stay with your current Pandas version :) – tlentali Jan 22 '22 at 12:42
  • This is perfect! I have made a slight adjustment to allow the user to supply their columns in a list. I will edit if this is ok? – Jack Rolph Jan 22 '22 at 13:03
  • Perfect ! Yes of course ! – tlentali Jan 22 '22 at 13:16
  • I approuved the edit, thanks ! Have a great weekend ! – tlentali Jan 22 '22 at 13:17
  • You too! Thank you very much! – Jack Rolph Jan 22 '22 at 13:24