My Spark DataFrame contains the following data:
user_id | id | timestamp
---------|----|-------------------
123 | 2 | 2018-10-12 9:25:30
123 | 3 | 2018-10-12 9:27:20
123 | 4 | 2018-10-12 9:45:15
123 | 5 | 2018-10-12 9:47:40
234 | 6 | 2018-10-12 9:26:32
234 | 7 | 2018-10-12 9:28:21
234 | 8 | 2018-10-12 9:46:16
234 | 9 | 2018-10-12 9:48:43
I need to count how many records each user has with time difference less than 15 min. The result should look like this:
user_id | count | window
---------|-------|----------------------------------------
123 | 2 | 2018-10-12 9:25:30 - 2018-10-12 9:27:20
123 | 2 | 2018-10-12 9:45:15 - 2018-10-12 9:47:40
234 | 2 | 2018-10-12 9:26:32 - 2018-10-12 9:28:21
234 | 2 | 2018-10-12 9:46:16 - 2018-10-12 9:48:43