I got a search sessions log that looks like this:
+----------+-------------------------+----------+
| dt | search_time | searches |
+----------+-------------------------+----------+
| 20200601 | 2020-06-01 00:36:38.000 | 1 |
| 20200601 | 2020-06-01 00:37:38.000 | 1 |
| 20200601 | 2020-06-01 00:39:18.000 | 1 |
| 20200601 | 2020-06-01 01:16:18.000 | 1 |
| 20200601 | 2020-06-01 03:56:38.000 | 1 |
| 20200601 | 2020-06-01 05:36:38.000 | 1 |
| 20200601 | 2020-06-01 05:37:38.000 | 1 |
| 20200601 | 2020-06-01 05:39:38.000 | 1 |
| 20200601 | 2020-06-01 05:41:38.000 | 1 |
| 20200601 | 2020-06-01 07:26:38.000 | 1 |
+----------+-------------------------+----------+
My task is to partition each row into session groups. Session groups are up to five minutes.
For example:
Those TOP 3 sessions will form a group session 1 - if we accumulate the minutes between each row, we will get 3 minutes and the 4th would accumulate to more then 5 minutes so it will be a different session group.
+----------+-------------------------+----------+---------------+
| dt | search_time | searches | group_session |
+----------+-------------------------+----------+---------------+
| 20200601 | 2020-06-01 00:36:38.000 | 1 | 1 |
| 20200601 | 2020-06-01 00:37:38.000 | 1 | 1 |
| 20200601 | 2020-06-01 00:39:18.000 | 1 | 1 |
| 20200601 | 2020-06-01 01:16:18.000 | 1 | 2 |
+----------+-------------------------+----------+---------------+
I manipulated the table like this in order to get it ready for partitioning:
WITH [Sub Table] AS
(
SELECT [dt]
,[search_time]
,[pervious search time] = LAG(search_time) OVER (ORDER BY search_time)
,[min diff] = ISNULL(DATEDIFF(MINUTE,LAG(search_time) OVER (ORDER BY search_time),search_time),0)
,[searches]
FROM [search_session]
)
SELECT
[dt],
[search_time],
[pervious search time],
[min diff],
[searches]
FROM [Sub Table]
And got this:
+----------+-------------------------+-------------------------+----------+----------+
| dt | search_time | pervious search time | min diff | searches |
+----------+-------------------------+-------------------------+----------+----------+
| 20200601 | 2020-06-01 00:36:38.000 | NULL | 0 | 1 |
| 20200601 | 2020-06-01 00:37:38.000 | 2020-06-01 00:36:38.000 | 1 | 1 |
| 20200601 | 2020-06-01 00:39:18.000 | 2020-06-01 00:37:38.000 | 2 | 1 |
| 20200601 | 2020-06-01 01:16:18.000 | 2020-06-01 00:39:18.000 | 37 | 1 |
| 20200601 | 2020-06-01 03:56:38.000 | 2020-06-01 01:16:18.000 | 160 | 1 |
| 20200601 | 2020-06-01 05:36:38.000 | 2020-06-01 03:56:38.000 | 100 | 1 |
| 20200601 | 2020-06-01 05:37:38.000 | 2020-06-01 05:36:38.000 | 1 | 1 |
| 20200601 | 2020-06-01 05:39:38.000 | 2020-06-01 05:37:38.000 | 2 | 1 |
| 20200601 | 2020-06-01 05:41:38.000 | 2020-06-01 05:39:38.000 | 2 | 1 |
| 20200601 | 2020-06-01 07:26:38.000 | 2020-06-01 05:41:38.000 | 105 | 1 |
+----------+-------------------------+-------------------------+----------+----------+
I thought about two possibilities to continue:
Using a window function, like RANK(), I can partition the rows, but I can't figure out how to set the PARTITION BY caluse with a condition to do so.
To iterate the table with a WHILE loop - again finding hard time to form ths