0

I have a dataframe:

a b  val1_b1 val1_b2 val2_b1 val2_v2
1 2    5       9        4      6

I want to take the max by column group, so the dataframe will be:

a b  val1  val2 
1 2    9     6  

or the RMS:

a b       val1    val2
1 2    sqrt(106)   sqrt(52)   

What is the best way to do so?

Cranjis
  • 1,590
  • 8
  • 31
  • 64

1 Answers1

0

You can use max with DataFrame.groupby by columns, axis=1 and lambda function fors values before first _ (if exist):

df1 = df.groupby(lambda x: x.split('_')[0], axis=1).max()
print (df1)
   a  b  val1  val2
0  1  2     9     6

EDIT for RMS use:

df2 = df.iloc[:, :2]
df3 = df.iloc[:, 2:].pow(2).groupby(lambda x: x.split('_')[0], axis=1).sum().pow(1./2)

#alternative
df3 = (df.iloc[:, 2:].groupby(lambda x: x.split('_')[0], axis=1)
         .apply(lambda a: pd.Series(np.linalg.norm(a))))

df4 = pd.concat([df2, df3], axis=1)
print (df4)

   a  b      val1      val2
0  1  2  10.29563  7.211103
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252