14

I have to following df:

Col1    Col2
test    Something
test2   Something
test3   Something
test    Something
test2   Something
test5   Something

I want to get

Col1    Col2          Occur
test    Something     2
test2   Something     2
test3   Something     1
test    Something     2
test2   Something     2
test5   Something     1

I've tried to use:

df["Occur"] = df["Col1"].value_counts()

But it didn't help. I've got Occur column full of 'NaN'

jpp
  • 159,742
  • 34
  • 281
  • 339
Laser
  • 6,652
  • 8
  • 54
  • 85

3 Answers3

15

You can also use GroupBy + transform with size:

df['Occur'] = df.groupby('Col1')['Col1'].transform('size')

print(df)

    Col1       Col2  Occur
0   test  Something      2
1  test2  Something      2
2  test3  Something      1
3   test  Something      2
4  test2  Something      2
5  test5  Something      1
jpp
  • 159,742
  • 34
  • 281
  • 339
7

groupby on 'col1' and then apply transform on Col2 to return a Series with its index aligned to the original df so you can add it as a column:

In [3]:
df['Occur'] = df.groupby('Col1')['Col2'].transform(pd.Series.value_counts)
df

Out[3]:
    Col1       Col2 Occur
0   test  Something     2
1  test2  Something     2
2  test3  Something     1
3   test  Something     2
4  test2  Something     2
5  test5  Something     1
EdChum
  • 376,765
  • 198
  • 813
  • 562
0

I can't get the other answers to work when I want to retain more columns than just the two columns Col1 and Col2. Below works well for me with any number of other columns retained.

df['Occur'] = df['Col1'].apply(lambda x: (df['Col1'] == x).sum())
Henrik
  • 673
  • 8
  • 18
  • I cannot replicate the behaviour implied in this answer. Assigning a series to `df['Occur']` as in the other answers does not impact other columns. Further, using `lambda` + manual sums with a Python-level `apply` loop will generally be inefficient versus `transform` + vectorised functionality. – jpp Jul 15 '20 at 08:08