1

I've got two DataFrames A and B, both with a MultiIndex in the columns. B however has the order of the columns switched up. Now I would simply like to get the columns of B in the same order as in A.

The clue is, the order depends on the second level of the MultiIndex. Pandas' reindex() should do just that, but it does only work for me at level 0 as shown in the following

df = pd.DataFrame([[1,2,3],[4,5,6]])
df.columns = pd.MultiIndex.from_arrays([["a","b","c"], ["aa","bb","cc"]])
print(df)

outputs

    a   b   c
    aa  bb  cc
0   1   2   3
1   4   5   6

First level of the MultiIndex:

df = df.reindex(columns=["b","a","c"], level=0)
print(df)

    b   a   c
    bb  aa  cc
0   2   1   3
1   5   4   6

Second level:

df = df.reindex(columns=["bb","aa","cc"], level=1)
df
    a   b   c
    aa  bb  cc
0   1   2   3
1   4   5   6

The function doesn't raise an exception. And I've also tried it with columns=["bb","aa","cc"], level=2 or columns=["bb","aa","cc"], level=0. There it outputs an empty DataFrame, so my implementation seems to be correct, but just nothing happens...

The next best workaround I can think of is changing level 1 to level 0, but that shouldn't be necessary. Looking forward to your ideas!

Nicoleue
  • 45
  • 6
  • Indices are hierarchical. After you re-index your data still reflects the order of the top level index because that one hasn't changed. – ifly6 Aug 29 '23 at 15:32

2 Answers2

4

If you want to reindex both levels, just pass your MultiIndex:

A = pd.DataFrame([[1,2,3],[4,5,6]])
A.columns = pd.MultiIndex.from_arrays([["a","b","c"], ["aa","bb","cc"]])

B = pd.DataFrame(columns=pd.MultiIndex.from_arrays([["b","a","c"],
                                                    ["bb","aa","cc"]]),
                 index=[0])

out = A.reindex(columns=B.columns)

Or maybe you just want to sort_index:

order = ["bb","aa","cc"]
out = A.sort_index(level=1, axis=1,
                   key=pd.Series({v:k for k,v in enumerate(order)}).get)

Output:

   b  a  c
  bb aa cc
0  2  1  3
1  5  4  6
mozway
  • 194,879
  • 13
  • 39
  • 75
  • maybe `key=order.index` works as well for not-so-wide dataframe. – Quang Hoang Aug 29 '23 at 15:55
  • Thanks a lot! Your application of `reindex()` works. But for understanding, my implementation should work as well, or? – Nicoleue Aug 29 '23 at 15:58
  • @QuangHoang yes it does but it has quadratic complexity so I prefer to never give such shortcuts, we never know who will read this and reuse the code ;) – mozway Aug 29 '23 at 16:03
  • 1
    @Nicoleue `reindex` indeed behaves differently with single and MultiIndexes. – mozway Aug 29 '23 at 16:13
  • I now need the second option as well, as I have different levels in different DataFrames, that only share the second index level. So the extensive answer is much appreciated(: – Nicoleue Aug 29 '23 at 17:04
0

assign back to df

df = df.reindex(columns=["b","a","c"], level=0)
df = df.reindex(columns=["bb","aa","cc"], level=1)
print(df)

   b  a  c
  bb aa cc
0  2  1  3
1  5  4  6
nisakova
  • 89
  • 6