How to create a new columns in grouped DataFrame?

Question

I have a DataFrame grouped by a categorical feature. For example, I have df

df[['APP_NO', 'REPAY_METHOD', 'RESIDUAL_DEBT']] \
.groupby(['APP_NO', 'REPAY_METHOD']).agg({'RESIDUAL_DEBT' : 'sum'}) 

ID   NUM  CAT_FEAT   aggr  
1   123   2         1233
2   234   2         6631
3   576   -1        -491
4   987   0         5461

NUM is an unique identifier

As a result, I want to get the following daraframe:

ID   NUM  CAT_FEAT   aggr_CF2   aggr_CF0   aggr_CFm1   
1   123   2         1233           -1          -1
2   234   2         6631           -1          -1
3   576   -1          -1           -1          -491
4   987   0           -1           5461        -1

That is, for each NUM, get the aggr tag with all CAT_FEAT values

If NUM does not have any value from CAT_FEAT, then replace it with -1

The question is how to implement this most correctly. The current df is already grouped by NUM. I have a DataFrame, without groupings. Maybe I initially did not think correctly.

There is no input data, this question does not make sense. Can you please provide a [mcve]? — cs95, Jan 06 '19 at 16:12
I can't work out how you're getting your output from your input. Can you provide a bit more detail about that? — Robert Harvey, Jan 06 '19 at 16:13

score 0 · Answer 1 · answered Jan 07 '19 at 08:28

It was just an example. Here are the real data:

ID.  APP_NO REPAY_METHOD    RESIDUAL_DEBT
0   755356650   0.0              0.00
1   756347150   2.0            20490.53
2   756927070   -1.0             0.00
3   757031330   2.0              0.00
4   757233210   2.0              0.00

And I want to get the following

ID.  APP_NO RESIDUAL_DEBT_RM0  RESIDUAL_DEBT_RM2  RESIDUAL_DEBT_RMm1
0   755356650   0.0              -1                        -1
1   756347150   -1             20490.53                    -1
2   756927070   -1               -1                         0
3   757031330   -1                0                        -1
4   757233210   -1                0                        -1

RESIDUAL_DEBT_RM0 is a feature where REPAY_METHOD = 0 e.t.c For each APP_NO features with all REPAY_METHOD values If APP_NO does not have any value from REPAY_METHOD, then replace it with -1

I have data where APP_NO is repeated. The main task is to group data by APP_NO and each categorical feature to make aggregated features

How to create a new columns in grouped DataFrame?

1 Answers1