Pandas groupby, filter and sum for each filter

Question

I have a dataframe like below:

data = {'ID': [1,2,3,4,5],
        'NODE_ID': [10,10,20,15,20],
        'TYPE': ['a','a','b','a','b'],
        'DATE': ['2021-12-02','2021-12-02','2021-12-02','2021-12-03','2021-12-02'],
        'HOUR': [0,0,3,2,3],
        'EVENTS_COUNT': [10,15,10,21,12]
        
        }

df = pd.DataFrame(data,columns=['ID','NODE_ID', 'TYPE', 'DATE', 'HOUR', 'EVENTS_COUNT'])

I have two different TYPE - a and b. I want to create a dataframe out of this so I have a sum of each TYPE (a and b) for the group of NODE_ID, DATE, HOUR.

The expected output is

Can you please suggest how to do this?

EDIT: Updated the expected output.

I'm not sure your expected output matches your question? If I group by `NODE_ID`, `DATE` and `HOUR` then shouldn't rows 1 and 3 be in the same group? — tomjn, Dec 15 '21 at 18:15
To add the above, the EVENTS_COUNT_b in row 1 be 22 and you shouldn't have the last row. — not_speshal, Dec 15 '21 at 18:20
This is essentially pivot_table: `df.pivot_table(index=['NODE_ID','DATE','HOUR'],columns='TYPE',values='EVENTS_COUNT',aggfunc=sum,fill_value=0)`. Check the link mentioned as dupe for detailed explanation with examples — anky, Dec 15 '21 at 18:22

not_speshal · Accepted Answer · 2021-12-15T18:30:26.747

2

Try with pivot_table:

output = (df.pivot_table("EVENTS_COUNT",["NODE_ID","DATE","HOUR"],"TYPE","sum")
            .fillna(0)
            .add_prefix("EVENTS_COUNT_")
            .reset_index()
            .rename_axis(None, axis=1))

>>> output

TYPE  NODE_ID        DATE  HOUR  EVENTS_COUNT_a  EVENTS_COUNT_b
0          10  2021-12-02     0            25.0             0.0
1          15  2021-12-03     2            21.0             0.0
2          20  2021-12-02     3             0.0            22.0

If there is only one TYPE for each combination of ["NODE_ID", "DATE", "HOUR"], and you would like this as a separate column, you can do:

output["TYPE"] = output.filter(like="EVENTS_COUNT_").idxmax(axis=1).str.lstrip("EVENTS_COUNT_")

>>> output
   NODE_ID        DATE  HOUR  EVENTS_COUNT_a  EVENTS_COUNT_b TYPE
0       10  2021-12-02     0            25.0             0.0    a
1       15  2021-12-03     2            21.0             0.0    a
2       20  2021-12-02     3             0.0            22.0    b

edited Dec 15 '21 at 18:30

answered Dec 15 '21 at 18:23

not_speshal

22,093
2
15
30

Shouldn't `TYPE` be `A` or `B`? – Derek O Dec 15 '21 at 18:25
If the same combination of NODE_ID, DATE and HOUR has values for both types, that doesn't make sense. – not_speshal Dec 15 '21 at 18:25
I guess we can't know that without more information from @nad – Derek O Dec 15 '21 at 18:27
1

@DerekO - I edited anyway. – not_speshal Dec 15 '21 at 18:30

Pandas groupby, filter and sum for each filter

1 Answers1