1

i am trying to count the number of occurrences of the same packet in my dataframe, a packet is the same if it has these matching fields:

['SourceIP', 'SourcePort', 'DestinationIP', 'DestinationPort', 'Protocol']

I want to make a new column in my dataframe with the occurrence values and have this column be called 'PerSec'

I have tried to implement the solution to this problem found here Pandas create new column with count from groupby

However when I run:

df['PerSec'] = df.groupby(['SourceIP', 'SourcePort', 'DestinationIP', 'DestinationPort', 'Protocol']['SourceIP', 'SourcePort', 'DestinationIP', 'DestinationPort', 'Protocol'].transform('PerSec'))
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    print(df)

I receive this error:

TypeError: list indices must be integers or slices, not tuple

Does this mean the groupby method will only work upon one column?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Lyra Orwell
  • 1,048
  • 4
  • 17
  • 46
  • your index should be a list.so groupby([...])[[...]] note the double brackets – sammywemmy Feb 18 '20 at 11:20
  • can you give a small sample of your data? but one though might be to pick a sngle column outside your groupers, call it `countcol` say, and then stick `['countcol']` between the groupby and the transform. – ags29 Feb 18 '20 at 11:20
  • and I think you want `count` inside the transform if I am reading your question correctly – ags29 Feb 18 '20 at 11:21

3 Answers3

0

Groupby method can be extended to multiple columns. It will group in the given order of columns

in you case

df.groupby(['SourceIP', 'SourcePort', 'DestinationIP', 'DestinationPort', 'Protocol'])

will return a grouped dataframe in which if SourceIP equals it groups by SourcePort and so on.

Surya Teja
  • 83
  • 5
0

It should be like this

df['PerSec'] = df.groupby([['SourceIP', 'SourcePort', 'DestinationIP', 'DestinationPort', 'Protocol']])['SourceIP', 'SourcePort', 'DestinationIP', 'DestinationPort', 'Protocol'].transform('PerSec')
Ahmad Farhan
  • 575
  • 4
  • 12
0

Add only one column after groupby and pass to groupby some aggregate function, e.g. GroupBy.count:

cols = ['SourceIP', 'SourcePort', 'DestinationIP', 'DestinationPort', 'Protocol']
df['PerSec'] = df.groupby(cols)['SourceIP'].transform('count')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252