0



This is not the easiest to explain, so I will do my best, but please ask if you need more clarification.
I'm using hive SQL, and have a table containing user and event time. What I'm looking to calculate, as shown in the rownum column below, is an incrementing sequence that is partitioned by user and event time within a 4 hour window, that resets 4 hours after the first event_time that contains values.

So in the table below...

All of user A's records are between 00:01:00 - 04:00:00
User B has 3 events, between the first window of 00:01:00 - 04:01:00, and 2 events in the second window from 04:08:00 - 08:08:00
User C has 2 events in the first 00:01:00 - 04:01:00 window, 1 event in the second 04:10:00 - 08:10:00 window, and 2 events in the 08:15:00 - 12:15:00 window

Any help/pointers would be greatly appreciated!!

| user | event_time | rownum |
|------|------------|--------|
| A    | 00:01:00   | 1      |
| A    | 00:02:00   | 2      |
| A    | 02:30:00   | 3      |
| B    | 00:01:00   | 1      |
| B    | 00:02:00   | 2      |
| B    | 02:30:00   | 3      |
| B    | 04:08:00   | 1      |
| B    | 04:10:00   | 2      |
| C    | 00:01:00   | 1      |
| C    | 00:02:00   | 2      |
| C    | 04:10:00   | 1      |
| C    | 08:15:00   | 1      |
| C    | 08:17:00   | 2      |
Rossy
  • 65
  • 1
  • 6
  • If you want the four-hour count to restart based on the first row in each group, then this requires a recursive CTE. I am not aware of a way of doing this using window functions. – Gordon Linoff Apr 15 '20 at 16:01
  • Thanks @GordonLinoff that gives me a starting point. I'm not too familiar with recursive CTE's, but will see how I get on with it – Rossy Apr 15 '20 at 16:18
  • . . Hive doesn't support recursive CTEs. I should have said that in the original comment. – Gordon Linoff Apr 15 '20 at 17:46
  • I'm still struggling with this. Would it make things easier if I said that the maximum rownum value I was looking for was 4...so basically for anything where the rownum is >4, just cap at 4? – Rossy May 07 '20 at 16:52
  • No. It would be possible in Hive if you said that you wanted to enumerate groups until there is a 4-hour difference between rows. – Gordon Linoff May 08 '20 at 00:20

0 Answers0