This is not the easiest to explain, so I will do my best, but please ask if you need more clarification.
I'm using hive SQL, and have a table containing user and event time. What I'm looking to calculate, as shown in the rownum column below, is an incrementing sequence that is partitioned by user and event time within a 4 hour window, that resets 4 hours after the first event_time that contains values.
So in the table below...
All of user A's records are between 00:01:00 - 04:00:00
User B has 3 events, between the first window of 00:01:00 - 04:01:00, and 2 events in the second window from 04:08:00 - 08:08:00
User C has 2 events in the first 00:01:00 - 04:01:00 window, 1 event in the second 04:10:00 - 08:10:00 window, and 2 events in the 08:15:00 - 12:15:00 window
Any help/pointers would be greatly appreciated!!
| user | event_time | rownum |
|------|------------|--------|
| A | 00:01:00 | 1 |
| A | 00:02:00 | 2 |
| A | 02:30:00 | 3 |
| B | 00:01:00 | 1 |
| B | 00:02:00 | 2 |
| B | 02:30:00 | 3 |
| B | 04:08:00 | 1 |
| B | 04:10:00 | 2 |
| C | 00:01:00 | 1 |
| C | 00:02:00 | 2 |
| C | 04:10:00 | 1 |
| C | 08:15:00 | 1 |
| C | 08:17:00 | 2 |