0

Having the following DF:

    id           timestamp
0    1 2020-09-01 15:14:35
1    1 2020-09-01 15:15:40
2    1 2020-09-01 15:16:59
3    1 2020-09-01 15:24:42
4    1 2020-09-01 15:25:50
5    1 2020-09-01 15:26:40
6    2 2020-09-01 18:14:35
7    2 2020-09-01 18:17:39
8    2 2020-09-01 18:24:40
9    2 2020-09-01 18:24:42
10   2 2020-09-01 18:34:40
11   2 2020-09-01 18:35:40
12   2 2020-09-01 18:36:40

Each id is a server endpoint, the timestamp is the time of a single request. Drawing a timeline chart: enter image description here

I would like to count the number of load periods each server had, I define a load period like so:
At least 3 request with time delta that is less than 5 minuets.

So server 1 have 2 loads, while server 2 have just 1 load. I would like to have the output as follows:

    id      timestamp       loads_detected
0    1 2020-09-01 15:14:35  0
1    1 2020-09-01 15:15:40  0
2    1 2020-09-01 15:16:59  1 <-- 3 requests in a row with less than 5 minuets a part
3    1 2020-09-01 15:25:42  1 <-- next request is more than 5 minutes
4    1 2020-09-01 15:25:50  1
5    1 2020-09-01 15:26:40  2 <-- 3 requests in a row with less than 5 minuets a part
6    2 2020-09-01 18:14:35  0 
7    2 2020-09-01 18:17:39  0 <-- Only 2 requests with less than 5 minuets, not increasing counter
8    2 2020-09-01 18:24:40  0
9    2 2020-09-01 18:24:42  0
10   2 2020-09-01 18:34:40  0
11   2 2020-09-01 18:35:40  0
12   2 2020-09-01 18:36:40  1 <-- 3 requests in a row with less than 5 minuets a part

Any help would be appreciated :)

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
Shlomi Schwartz
  • 8,693
  • 29
  • 109
  • 186

1 Answers1

1

IIUC, You could do group by id and a frequency of 5 minutes, count the number of times 3 consecutives requests appear and then cumsum on that result:

df['loads_detected'] = df.groupby(['id', pd.Grouper(key="timestamp", freq='5min', origin='start')]).cumcount().eq(2)
df['loads_detected'] = df.groupby('id').cumsum()
print(df)

Output

    id           timestamp  loads_detected
0    1 2020-09-01 15:14:35               0
1    1 2020-09-01 15:15:40               0
2    1 2020-09-01 15:16:59               1
3    1 2020-09-01 15:24:42               1
4    1 2020-09-01 15:25:50               1
5    1 2020-09-01 15:26:40               2
6    2 2020-09-01 18:14:35               0
7    2 2020-09-01 18:17:39               0
8    2 2020-09-01 18:24:40               0
9    2 2020-09-01 18:24:42               0
10   2 2020-09-01 18:34:40               0
11   2 2020-09-01 18:35:40               0
12   2 2020-09-01 18:36:40               1
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76