2

I have dataframe X

>>> X
                    A                              B                    
                   x1        x2 intercept         x1        x2 intercept
Date                                                                    
2020-12-31  48.021395  2.406670         1 -11.538462  2.406670         1
2021-03-31  33.229490  2.410444         1 -23.636364  2.405720         1
2021-06-30  11.498812  2.419787         1 -32.727273  2.402403         1
2021-09-30   5.746014  2.583867         1 -34.000000  2.479682         1
2021-12-31   4.612371  2.739457         1 -39.130435  2.496616         1
2022-03-31   3.679404  2.766474         1 -40.476190  2.411736         1
2022-06-30   3.248155  2.771958         1 -45.945946  2.303280         1

and series b:

>>> b
         
x1        -0.006
x2         0.083
intercept  0.017

I need to compute dot product of each of groups A, B with b, and put the results in one dataframe. I can go through each group explicitly, like the following:

result = pd.concat(
    [X["A"].dot(b).rename("A"), X["B"].dot(b).rename("B"),], axis=1,
)

                   A         B
Date                          
2020-12-31 -0.071375  0.285984
2021-03-31  0.017690  0.358493
2021-06-30  0.148849  0.412763
2021-09-30  0.196985  0.426814
2021-12-31  0.216701  0.459002
2022-03-31  0.224541  0.460031
2022-06-30  0.227584  0.483848

Is there a way to achieve the same without explicitly looping through the groups? In particular, is it possible to first groupby the first level of MultiIndex, then apply the dot product to each group? For example:

result=X.groupby(level=[0], axis=1).apply(lambda x: x.dot(b))

This will give me ValueError: matrices are not aligned error, which I think is due to the fact that groups in X have two levels of index in its columns whereas b's index is a simple index. So I will need to add a level of index to b to match that in X? Like:

result=X.groupby(level=[0], axis=1).apply(
    lambda x: x.dot(pd.concat([b], keys=[x.columns.get_level_values(0)[0]]))
)

With this I get ValueError: cannot reindex from a duplicate axis. I am getting stuck here.

Ke Cai
  • 23
  • 2

1 Answers1

1

Use DataFrame.droplevel for remove top level with rename:

f = lambda x: x.droplevel(0, axis=1).dot(b).rename(x.name)
result=df.groupby(level=0, axis=1).apply(f)
print (result)
                   A         B
2020-12-31 -0.071375  0.285984
2021-03-31  0.017690  0.358493
2021-06-30  0.148849  0.412763
2021-09-30  0.196985  0.426814
2021-12-31  0.216701  0.459002
2022-03-31  0.224541  0.460031
2022-06-30  0.227584  0.483848
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks! This is super helpful. What exactly is the input to the function being applied? I thought it is "sub-dataframe" under the top level index. But "x.name" suggests it is a pandas.Series? – Ke Cai Nov 18 '20 at 09:45
  • @KeCai - First is `x` Dataframe and after `x.droplevel(0, axis=1).dot(b)` is `Series`, so possible use `rename` here – jezrael Nov 18 '20 at 09:54
  • I naively thought ```x.droplevel(0, axis=1).dot(b)``` would result in a nameless ```Series```. If I do ```df.groupby(level=0, axis=1).get_group("A").droplevel(0, axis=1).dot(b).name``` I get nothing. I am still not getting something. – Ke Cai Nov 18 '20 at 10:08
  • @KeCai - Yes, it is expected, so added `rename` by groupings columns, here `A, B` for rename nameless Series to `A, B` Series – jezrael Nov 18 '20 at 10:12
  • Sorry, still confused -- yes, ```rename``` gives ```x.name``` to the nameless ```x.droplevel(0, axis=1).dot(b)```, but where does ```x.name``` come from? What is the ```x``` in ```x.name```? – Ke Cai Nov 18 '20 at 10:34
  • @KeCai - godd question, i think because MultiIndex in columns it is values from first level, because `df.groupby(level=0, axis=1)` – jezrael Nov 18 '20 at 10:36
  • 1
    After reading through some other questions, it seems that ```name``` attribute of a group is only visible when calling apply, for example here [link](https://stackoverflow.com/questions/32460593/including-the-group-name-in-the-apply-function-pandas-python). Thanks for your patient response! – Ke Cai Nov 18 '20 at 10:58