2

I have a python dataframe that looks like the following:

enter image description here

This dataframe has been sorted in descending order by 'transaction_count'. I want to create another column in that dataframe called 'rank' that contains the count of occurrences of cust_ID. My desired output would look something like the following:

enter image description here

For cust_ID = 1234 with transaction_count = 4, the rank would be 1, for the next appearance of cust_ID = 1234, the rank would be 2 and so on.

I tried the following among other things:

df['rank'] = df["cust_ID"].value_counts()
df.head(10)

But the rank column gets created as all NaN values

enter image description here

Any suggestions on how to approach this would be greatly appreciated!

cottontail
  • 10,268
  • 18
  • 50
  • 51
user3116949
  • 265
  • 1
  • 5
  • 14

2 Answers2

4

you can do:

df['rank'] = df.groupby('cust_ID')['transaction_count'].rank(ascending=False)

Output:

    cust_ID     associate_ID    transaction_count   rank
0   1234           608          4                   1.0
1   1234           785          1                   2.0
2   4789           345          2                   1.0
3   3456           268          5                   1.0
4   3456           725          3                   2.0
5   3456           795          1                   3.0

Note that this gives not only the counts, but also the rank of the transaction, based on the transaction_count value.

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
2

Use groupby + cumcount:

df['rank'] = df.groupby('cust_ID').cumcount() + 1
print(df['rank'])

Output

0    1
1    2
2    1
3    1
4    2
5    3
Name: rank, dtype: int64
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76