1

I have a data:

# dt
Column1     
      1
      2
      3
      4
      5
      6
      7
      8
      9

I want to create a new column by bins' average of min and max.

# dt
Column1    Column2
      1          2
      2          2
      3          2
      4          5
      5          5
      6          5
      7          8
      8          8
      9          8

pd.qcut(dt['Column1'], 3)

So column2 = (min of bin + max of bin)/2.

Peter Chen
  • 1,464
  • 3
  • 21
  • 48

1 Answers1

1

Use GroupBy.transform with lambda function for return Series with same size like original DataFrame:

dt['Column2'] = (dt.groupby(pd.qcut(dt['Column1'], 3))['Column1']
                   .transform(lambda x: x.max() + x.min()) / 2)

Or use double transform with add and div:

g = dt.groupby(pd.qcut(dt['Column1'], 3))
dt['Column2'] = g['Column1'].transform('max').add(g['Column1'].transform('min')).div(2)
print (dt)
   Column1  Column2
0        1      2.0
1        2      2.0
2        3      2.0
3        4      5.0
4        5      5.0
5        6      5.0
6        7      8.0
7        8      8.0
8        9      8.0

EDIT:

cols = ['Column1']
for col in cols:
    dt[f'New {col}'] = (dt.groupby(pd.qcut(dt[col], 3))[col]
                       .transform(lambda x: x.max() + x.min()) / 2)
print (dt)
   Column1  New Column1
0        1          2.0
1        2          2.0
2        3          2.0
3        4          5.0
4        5          5.0
5        6          5.0
6        7          8.0
7        8          8.0
8        9          8.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • If there are many columns, how can I generate without type multiple times? In specific, new column from original called `Old column_B`. – Peter Chen Oct 24 '18 at 05:20
  • 1
    @PeterChen - Do you think like edited answer? Columns for processing are in list `cols` – jezrael Oct 24 '18 at 05:23
  • I used your code on this: `for col in cols: dt[f'New {col}'] = dt.clip(dt.quantile(0.25) - 1.5*IQR, dt.quantile(0.75) + 1.5*IQR, axis = 1)[col]` , but it did not work. maybe I cannot use this? – Peter Chen Oct 25 '18 at 01:33
  • @PeterChen - I tested it with `dt = pd.DataFrame({'Column1': {0: 1, 1: 20, 2: 3, 3: 4, 4: 10, 5: 6, 6: 7, 7: 8, 8: 9}, 'aaa': {0: 4, 1: 7, 2: 8, 3: 2, 4: 1, 5: 30, 6: 1, 7: 6, 8: 3}})` – jezrael Oct 25 '18 at 05:16
  • and then `IQR = 0.2 cols = dt.columns for col in cols: dt[f'New {col}'] = dt.clip(dt.quantile(0.25) - 1.5*IQR, dt.quantile(0.75) + 1.5*IQR, axis = 1)[col]` – jezrael Oct 25 '18 at 05:16
  • it return some values, but not sure, if correct. Can you explain more what means not working? – jezrael Oct 25 '18 at 05:17