Is there no syntax suger for dynamic creating columns with multiindexed pandas dataframe?

Question

First, I show the pandas dataframe to elucidate my problem.

import pandas as pd
mi = pd.MultiIndex.from_product([["A","B"],["c","d"]], names=['lv1', 'lv2'])
df1 = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]],columns=mi)

this python code creates dataframe(df1) like this:

#input dataframe
lv1  A       B
lv2  c   d   c   d
0    1   2   3   4
1    5   6   7   8
2    9  10  11  12

I want to create columns 'c*d' on lv2 by using df1's data. like this:

#output dataframe after calculation
lv1  A           B        
lv2  c   d  c*d  c    d  c*d
0    1   2    2  3    4   12
1    5   6   30  7    8   56
2    9  10   90  11  12  132

For this problem,I wrote some code like this:

for l1 in mi.levels[0]:
    df1.loc[:, (l1, "c*d")] = df1.loc[:,(l1,"c")]*df1.loc[:,(l1,"d")]
df1.sort_index(1,inplace=True)

Although this code almost solved my problem, but I really want to write without 'for' statement like this:

df1.loc[:,(slice(None),"c*d")]=df1.loc[:,(slice(None),"c")]*df1.loc[:,(slice(None),"d")]

With this statement,I got Key error that says 'c*d' is missing. Is there no syntax sugar for this calculation? Or can I achieve better performance by other code?

score 2 · Answer 1 · answered Oct 28 '17 at 08:40

A bit improved your solution:

for l1 in mi.levels[0]:
    df1.loc[:, (l1, "c*d")] = df1.loc[:,(l1,"c")]*df1.loc[:,(l1,"d")]
mux = pd.MultiIndex.from_product([df1.columns.levels[0], ['c','d','c*d']])
df1 = df1.reindex(columns=mux)
print (df1)
   A           B         
   c   d c*d   c   d  c*d
0  1   2   2   3   4   12
1  5   6  30   7   8   56
2  9  10  90  11  12  132

Another solution with stack and unstack:

mux = pd.MultiIndex.from_product([df1.columns.levels[0], ['c','d','c_d']])
df1 = df1.stack(0)
         .assign(c_d = lambda x: x.sum(1))
         .unstack()
         .swaplevel(0,1,1)
         .reindex(columns=mux)
print (df1)
   A           B        
   c   d c_d   c   d c_d
0  1   2   3   3   4   7
1  5   6  11   7   8  15
2  9  10  19  11  12  23

df2 = df1.xs("c", axis=1, level=1).mul(df1.xs("d", axis=1, level=1))
df2.columns = pd.MultiIndex.from_product([df2.columns, ['c*d']])
print (df2)
    A    B
  c*d  c*d
0   2   12
1  30   56
2  90  132

mux = pd.MultiIndex.from_product([df2.columns.levels[0], ['c','d','c*d']])
df = df1.join(df2).reindex(columns=mux)
print (df)
   A           B         
   c   d c*d   c   d  c*d
0  1   2   2   3   4   12
1  5   6  30   7   8   56
2  9  10  90  11  12  132

score 1 · Accepted Answer · answered Oct 29 '17 at 03:35

Explanation of jezrael's answer using stack which is may be the most idiomatic way in pandas.

output = (df1
             # "Stack" data, by moving the top level ('lv1') of the
             # column MultiIndex into row index,
             # now the rows are a MultiIndex and the columns
             # are a regular Index.
             .stack(0)

             # Since we only have 2 columns now, 'lv2' ('c' & 'd')
             # we can multiply them together along the row axis.
             # The assign method takes key=value pairs mapping new column
             # names to the function used to calculate them. Here we're
             # wrapping them in a dictionary and unpacking them using **
             .assign(**{'c*d': lambda x: x.product(axis=1)})

             # Undos the stack operation, moving 'lv1', back to the
             # column index, but now as the bottom level of the column index
             .unstack()

             # This sets the order of the column index MultiIndex levels.
             # Since they are named we can use the names, you can also use
             # their integer positions instead. Here axis=1 references
             # the column index
             .swaplevel('lv1', 'lv2', axis=1)

             # Sort the values in both levels of the column MultiIndex.
             # This will order them as c, c*d, d which is not what you
             # specified above, however having a sorted MultiIndex is required
             # for indexing via .loc[:, (...)] to work properly
             .sort_index(axis=1)
          )

Is there no syntax suger for dynamic creating columns with multiindexed pandas dataframe?

2 Answers2