Count duplicates / recurring IDs within 24 hours

Question

I'm trying to count number of comments that each user posted within 24 hours. In other words: count number of duplicates (=user ids) within 24 hours. If count is >= 5 I want to record it.

Example (final output below):

If a user with an ID 100 posted a comment on 13/02/2018 05:15 and 19/02/2018 03:52 and 19/02/2018 16:53 the count should be 2 for this user ID. However I dont want to write it down as it is not >=5
If a user with an ID 521 posted comments on
- 13/02/2018 02:08,
- 19/02/2018 15:05,
- 19/02/2018 15:53,
- 19/02/2018 15:55,
- 19/02/2018 16:00,
- 19/02/2018 18:00,
- 20/02/2018 18:40
First count for end_date_24 19/02/2018 18:00 should be 5, second count for end_date_24 19/02/2018 18:40 should be also 5

Input data:

print(df)
               Date  User_ID  Post_ID
0  13/02/2018 02:08      521       11
1  13/02/2018 05:15      100       12
2  19/02/2018 03:52      100       13
3  19/02/2018 15:05      521       14
4  19/02/2018 15:53      521       15
5  19/02/2018 15:55      521       16
6  19/02/2018 16:00      521       17
7  19/02/2018 16:53      100       18
8  19/02/2018 18:00      521       19
9  20/02/2018 18:40      521       20

Expected output:

        End_Date_24  Count_last_24H  User_ID
0  19/02/2018 18:00               5      521
1  19/02/2018 18:40               5      521

score 0 · Answer 1 · answered Apr 18 '19 at 10:27

0

I would go for pandas. Then wrote a function wchich reduces the data frame to records with given time period and given user ID. At the end counting and logging if a condition is met. Have a look at pandas' DataFrame. I think it provides you everything you want.

answered Apr 18 '19 at 10:27

paragnomen

31
1
3

It looks like OP is already using `pandas`, judging by the name and format of the dataframe. – Graipher Apr 18 '19 at 10:54
Yeah, now I see it ;) – paragnomen Apr 18 '19 at 11:07

Count duplicates / recurring IDs within 24 hours

1 Answers1