1

Not able to fetch all the columns of the Dataframe after applying groupby method of Pandas

I have a sample Dataframe as below.

  col1 col2        day col4
0   a1   b1     monday   c1
1   a2   b2    tuesday   c2
2   a3   b3  wednesday   c3
3   a1   b1     monday   c5

Here 'a1 b1 monday' are repeated twice. So after groupby the output should be:

col1    col2          day     col4  count
a1        b1       monday      c1     2
a2        b2      tuesday      c2     1
a3        b3    wednesday      c3     1

I tried using df.groupby(['col1','day'],sort=False).size().reset_index(name='Count')

and

df.groupby(['col1','day']).transform('count')

and the output is always

col1    day         count
a1  monday        2
a2  tuesday       1
a3  wednesday     1

where as my original data have 14 columns and it is not making sense to keep all the column names in groupby statement. Is there a better pythonic way to achieve this??

user3483203
  • 50,081
  • 9
  • 65
  • 94
Kumar-58
  • 47
  • 6
  • `df.groupby(['col1', 'day'])['col4'].agg(['first', 'count']).reset_index()` – user3483203 Apr 18 '19 at 22:43
  • As i mentioned real data has total of 14 columns with big strings as column names, so it is not a pleasant way to keep all 14 column names in the groupby statement – Kumar-58 Apr 18 '19 at 22:48

1 Answers1

1

First groupby with transform to make your count column.

Then use drop_duplicates to remove duplicate rows:

df['count'] = df.groupby(['col1','day'],sort=False)['col1'].transform('size')
df.drop_duplicates(['col1', 'day'], inplace=True)

print(df)
  col1 col2        day col4  count
0   a1   b1     monday   c1      2
1   a2   b2    tuesday   c2      1
2   a3   b3  wednesday   c3      1
Erfan
  • 40,971
  • 8
  • 66
  • 78