Extract duplicity without rearranging the column and find cumsum in python

Question

I have a dataset with 4000 rows, where I have the duplicate rows(e.g. 2, 3, 4 times). I want to find the cumsum of the duplicates over time.

I have used this code to assign the number of duplicity. But it has rearranged the position of ID

df = duplicate_df.value_counts(sort=False, dropna=False).reset_index(name="Duplicity")

Output

ID       Time      Duplicity 
12345    2020        2
12345    2020        2
34567    2021        1
34696    2020        3  
34696    2020        3
34696    2020        3

whereas I want to add the duplicity and the ID remains same position.

ID       Time      Duplicity 
34696    2020        3  
12345    2020        2
12345    2020        2
34696    2020        3
34696    2020        3
34567    2021        1

How to find cumsum of duplicity over time? Thank you.

Input data:

d = {'ID': [34696, 12345, 12345, 34696, 34696, 34567],
     'Time': [2020, 2020, 2020, 2020, 2020, 2021]}

are you sure you want the cumsum? looks like you just want to count how often each row appears. — timgeb, Mar 07 '22 at 08:48
Thank you for noting the point. Sorry for my misunderstanding. I want to find a visualization where it is showing how the duplicity reflect over time. In same month, some data reflects twice and thrice. How to visualize then? — Mon Mon, Mar 07 '22 at 08:58

score 1 · Accepted Answer · answered Mar 07 '22 at 08:22

1

Use groupby and transform:

df['Duplicity'] = df.groupby(['ID', 'Time'])['ID'].transform('size')
print(df)

# Output
      ID  Time  Duplicity
0  34696  2020          3
1  12345  2020          2
2  12345  2020          2
3  34696  2020          3
4  34696  2020          3
5  34567  2021          1

answered Mar 07 '22 at 08:22

Corralien

109,409
8
28
52

This is clearly not the "cumsum of the duplicates over time" OP claims to want, but produces the correct result apparently. – timgeb Mar 07 '22 at 08:27
@timgeb. I'm totally agree with you. I let the OP clarify this point. – Corralien Mar 07 '22 at 08:29

Extract duplicity without rearranging the column and find cumsum in python

1 Answers1