0

I have a table with entry time, exit time, location id, row num (auto generated),gap(gap from previous visit), final_group (generated value to separate records which has gap>100 seconds).

I need to consolidate all the visits which has the gap less than 100 seconds. For this i need to get first entry time and last exit time in the group. Please see the attached image to see the data. filter_table image

I thought of having a new incremented value for each change in the value of final gap, so that I can easily group it and find min entry time and max exit time.I need to get output like in the image column "final_group".enter image description here

Please help me in fixing this issue. thank you in advance.

ABC D
  • 11
  • 2

1 Answers1

0

At first glance this seemed like it would require a multi-step answer involving a temp table containing GROUPed results HAVING the total elasped time between MIN(entry_time) and MAX(exit_time) < 100 then UNIONing that to the original table LEFT JOINed to temp WHERE NULL, but then I realized your Gap wasn't the time spent on but the time spent away. So...

If I understand the question correctly, this ought to do the trick:

select min(entry_time), max(exit_time), id, location_id 
    from filter_table 
    group by id, location_id, case when gap < 100 then 0 else entry_time end

I tried to keep the SQL as generic as possible. The trick is in the CASE statement in the GROUP BY; it genericizes gaps < 100 thereby grouping them but keeps bigger gaps unique and ungrouped.

Round 2 - Adding a Windowing Function

Turns out, we need to expand on the first solution (see comments).

select min(a.entry_time), max(a.exit_time), a.id, a.location_id 
    from 
        (select *, sum(Final_group) over (order by row_num rows unbounded preceding) as running_total
            from filter_table 
        ) a
    group by a.id, a.location_id, case when a.gap < 100 then 0 else entry_time end, a.running_total

By adding a windowing function in the new subquery, we can calculate a running total of Final_group and by grouping on that too, we can now distinguish between groups on either side of a gap > 100.

Round 3 - Use a Self Join Instead

I am reliably informed that MonetDB does not support windowing functions. So let's rewrite the subquery to use an old-school self join to get the running total:

select min(a.entry_time), max(a.exit_time), a.id, a.location_id 
    from 
        (select t1.entry_time, t1.exit_time, t1.id, t1.location_id. t1.gap, 
                sum(t2.Final_group) as running_total
            from filter_table t1
            inner join filter_table t2 on t2.row_num <= t1.row_num
            group by t1.entry_time, t1.exit_time, t1.id, t1.location_id. t1.gap
        ) a
    group by a.id, a.location_id, case when a.gap < 100 then 0 else entry_time end, a.running_total

Lastly - Moved the Case statement
Monet doesn't like the Case statement in the Where clause either, So:

select min(a.entry_time), max(a.exit_time), a.id, a.location_id 
    from 
        (select t1.entry_time, t1.exit_time, t1.id, t1.location_id. t1.gap, 
                sum(t2.Final_group) as running_total,
                case when t1.gap < 100 then 0 else t1.entry_time end as case_gap
            from filter_table t1
            inner join filter_table t2 on t2.row_num <= t1.row_num
            group by t1.entry_time, t1.exit_time, t1.id, t1.location_id. t1.gap
        ) a
    group by a.id, a.location_id, a.case_gap, a.running_total

I hope this helps

jwolf
  • 908
  • 7
  • 13
  • Thanks for your reply. Still the same issue, I am facing issue with row_num 8,9,10 in the [filter_table image][1] . as this should group as 8th-1 record, and 9th&10th- another set of records. It should consolidate consecutive visits based on gap value. The above suggested query is directly grouping 8th and 10th records which is incorrect. Please help me. Thanks for your time. – ABC D Feb 15 '18 at 08:52
  • @ABCD - I have an updated solution here for you. It now divides the groups where there is a gap > 100. If this works for you, please don't forget to 'Accept' this as the answer. Thanks. – jwolf Feb 15 '18 at 09:43
  • sum()/Min()/Max() over are not supported in monetdb :-( – ABC D Feb 15 '18 at 11:42
  • @ABCD - I have added a 3rd variation that gets the running total without using OVER. This is an older way of doing it that just uses basic joins - I'm sure it's supported. If this works for you, please don't forget to 'Accept' this as the answer. Thanks. – jwolf Feb 15 '18 at 16:19
  • @ABCD - I don't know if you looked at the yet, but I just updated that 3rd query a minute ago. I circled back to see if anything had happened with this - and noticed that I had omitted something. It's fixed now. – jwolf Feb 15 '18 at 21:50
  • @ABCD - Have you seen this since I edited the third update? It should work fine in MonetDB now. Please don't forget to 'Accept' this as the answer if it helps you - I've put a decent amount of work into it. Thanks. – jwolf Feb 17 '18 at 02:30
  • thank you for your time. It is not working as expected, case is not supported in where condition and also, the query is grouping not based on consecuive queries. I am still tryng different ways. – ABC D Feb 17 '18 at 06:17
  • @ABCD - I think I can eliminate the Case in the Where clause. Would you like me to continue? – jwolf Feb 17 '18 at 06:24
  • @ABCD - It was an easy change so I went ahead and did it. As for grouping on consecutive rows, that's why I added running_total. Once you hit a Gap > 100, the running_total increments and because it's part of the Group By, it forces the rows before and after the Gap > 100 into different groups. Give it a try. – jwolf Feb 17 '18 at 06:36
  • @ABCD - Unless,.. The running_total is 'sorted' by row_num. Is row_num possibly not sequential (chronological). I could use entry_time instead if that's the case. – jwolf Feb 17 '18 at 06:40
  • @ABCD - Cool, thanks - Should I sort by entry_time before you do or will row_num be good enough? – jwolf Feb 17 '18 at 07:40
  • @ABCD - ...I wrote it with row_num because it guarantees uniqueness, and as long as the row_num's are sequential, it's the better way to go. But if you ever need to change it, edit the line `inner join filter_table t2 on t2.row_num <= t1.row_num` to use entry_time instead of row_num. – jwolf Feb 17 '18 at 07:46
  • @jwolf- `select min(a.entry_time), max(a.exit_time), a.id, a.location_id from (select t1.entry_time, t1.exit_time, t1.id, t1.location_id, t1.gap, sum(t2.Final_group) as running_total, from filter_table t1 inner join filter_table t2 on t2.row_num <= t1.row_num group by t1.entry_time, t1.exit_time, t1.id, t1.location_id, t1.gap ) a group by a.id, a.location_id, a.running_total; ` This solved my problem. _Thanks a ton jwolf_. :-) – – ABC D Feb 17 '18 at 08:27
  • @jwolf- I have millions of records, this one is causing memory issue.Please suggest me if you have any other alternate for this. Please help me – ABC D Mar 29 '18 at 10:59