Groupby upon multiple columns, then append

Question

i am trying to count the number of occurrences of the same packet in my dataframe, a packet is the same if it has these matching fields:

['SourceIP', 'SourcePort', 'DestinationIP', 'DestinationPort', 'Protocol']

I want to make a new column in my dataframe with the occurrence values and have this column be called 'PerSec'

I have tried to implement the solution to this problem found here Pandas create new column with count from groupby

However when I run:

df['PerSec'] = df.groupby(['SourceIP', 'SourcePort', 'DestinationIP', 'DestinationPort', 'Protocol']['SourceIP', 'SourcePort', 'DestinationIP', 'DestinationPort', 'Protocol'].transform('PerSec'))
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    print(df)

I receive this error:

TypeError: list indices must be integers or slices, not tuple

Does this mean the groupby method will only work upon one column?

your index should be a list.so groupby([...])[[...]] note the double brackets — sammywemmy, Feb 18 '20 at 11:20
can you give a small sample of your data? but one though might be to pick a sngle column outside your groupers, call it `countcol` say, and then stick `['countcol']` between the groupby and the transform. — ags29, Feb 18 '20 at 11:20
and I think you want `count` inside the transform if I am reading your question correctly — ags29, Feb 18 '20 at 11:21

score 0 · Answer 1 · answered Feb 18 '20 at 11:26

Groupby method can be extended to multiple columns. It will group in the given order of columns

in you case

df.groupby(['SourceIP', 'SourcePort', 'DestinationIP', 'DestinationPort', 'Protocol'])

will return a grouped dataframe in which if SourceIP equals it groups by SourcePort and so on.

score 0 · Answer 2 · answered Feb 18 '20 at 11:26

0

It should be like this

df['PerSec'] = df.groupby([['SourceIP', 'SourcePort', 'DestinationIP', 'DestinationPort', 'Protocol']])['SourceIP', 'SourcePort', 'DestinationIP', 'DestinationPort', 'Protocol'].transform('PerSec')

answered Feb 18 '20 at 11:26

Ahmad Farhan

575
4
12

score 0 · Accepted Answer · answered Feb 18 '20 at 11:36

0

Add only one column after groupby and pass to groupby some aggregate function, e.g. GroupBy.count:

cols = ['SourceIP', 'SourcePort', 'DestinationIP', 'DestinationPort', 'Protocol']
df['PerSec'] = df.groupby(cols)['SourceIP'].transform('count')

answered Feb 18 '20 at 11:36

jezrael

822,522
95
1,334
1,252

Groupby upon multiple columns, then append

3 Answers3