1

I do have a data frame with so many cols. I would like to apply cbrt transformation first and then StandardScaler() to some specific cols in a dataframe for each month but I received some errors

df=pd.DataFrame({'month':['1','1','1','1','1','2','2','2','2','2','2','2'],'X1': 
[30,42,25,32,12,10,4,6,5,10,24,21],'X2':[10,76,100,23,65,94,67,24,67,54,87,81],'X3': 
[23,78,95,52,60,76,68,92,34,76,34,12]})
df

My code below is but no worries about Month

df['X1']=pd.Series(np.cbrt(df['X1'])).values

Below is for but needs to consider group month

  from sklearn.preprocessing import StandardScaler
  scaler = StandardScaler()
  df['X1_scale'] = scaler.group('Month').fit(df['X1'])

I would like to combine these two operations on a autamated function that adds column X1_Scale and X2_Scale but since I have so many cols I would like to do this on first 2 cols (df.loc[:,2:3]) in general. Please help. Thank you.

Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
melik
  • 1,268
  • 3
  • 21
  • 42

1 Answers1

1

We can use np.cbrt to calculate the element wise cube root on the first two columns followed by groupby on month and transformation using zscore to calculate the standard score of each sample per unique month.

from scipy.stats import zscore

c = df.columns[1:3]
df[c + '_Scale'] = np.cbrt(df[c]).groupby(df['month']).transform(zscore)

   month  X1   X2  X3  X1_Scale  X2_Scale
0      1  30   10  23  0.286075 -1.531934
1      1  42   76  78  1.220298  0.705876
2      1  25  100  95 -0.178042  1.142135
3      1  32   23  52  0.457241 -0.790689
4      1  12   65  60 -1.785572  0.474613
5      2  10   94  76  0.004353  1.026875
6      2   4   67  68 -1.208026  0.093139
7      2   6   24  92 -0.716861 -2.171608
8      2   5   67  34 -0.945947  0.093139
9      2  10   54  76  0.004353 -0.449041
10     2  24   87  34  1.565310  0.804088
11     2  21   81  12  1.296817  0.603408
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53