1

I am struggling with a similar issue like in this thread: Check date split periods are continuous

Can someone help me translate the Qsebas' answer to Presto language? As this is exactly what I'd like to achieve.

I am not sure what supposed to be the outcome of this part:

CROSS APPLY  Enumerate ( ABS(DATEDIFF(d, From_Date, To_Date))) AS NUMBERS

And how to replace it. Would it be ROW_NUMBER () as a one column and ABS(DATE_DIFF()) in the second?

My sample data:

license_plate create_timestamp delete_timestamp
AA-AAA 2019-10-08 10:47:54 \N
AA-AAA 2021-01-22 12:37:21 2021-07-21 8:27:44
AA-AAA 2021-07-19 9:10:39 \N
BB-BBB 2016-04-15 8:38:59 2021-11-04 10:51:18
BB-BBB 2018-03-13 13:56:39 2021-10-07 08:21:07
BB-BBB 2021-12-23 12:42:31 \N
CC-CCC 2019-07-26 21:22:42 2021-12-17 18:21:37
CC-CCC 2021-11-05 11:08:13 2022-02-11 08:44:22

I would like to get sth like that:

license_plate create_timestamp delete_timestamp
AA-AAA 2019-10-08 10:47:54 \N
BB-BBB 2016-04-15 8:38:59 2021-10-07 08:21:07
BB-BBB 2021-12-23 12:42:31 \N
CC-CCC 2019-07-26 21:22:42 2022-02-11 08:44:22

There can be also a column count_ranges to count how many ranges for each license plate there were. There can be many entries for every license plate and they can overlap (so sorting by the create_timestamp and comparing row by row doesn't work). The idea is to group vehicles (license plates) into:

  • 'Active' (there was no gap since first create_timestamp till now/end_date); license_plate AA-AAA in my example
  • 'Reactivated' (there was at least on gap but the car is active now) - license_plate BB-BBB
  • 'Deleted' (there was no gap since first create_timestamp and last delete_timestamp but it is not active now) - license_plate CC-CCC

I know there could be more scenarios (i.e. 'Reactivated - Deleted', 'Reactivated - Reactivated' but this split will do for now.

If I had a table that I'd like I could then assign them to groups by:

CASE WHEN count_ranges > 1 THEN 'Reactivated'
           WHEN count_ranges = 1 AND delete_timestamp is null THEN 'Active'
           WHEN count_ranges = 1 AND delete_timestamp is not null THEN 'Deleted'
END AS vehicle_status

Maybe there is a different way to achieve this?

PS. Listing the exact delete_timestamp at the end of the range and create_timestamp at the beginning of the next range for a single license plate would be useful to check if a license plate was reactivated by the same user or by someone else.

Thanks!

2 Answers2

0

Using sample data from the linked answer I suggest to use gaps and islands approach - use lag to compare previous to_date with current from_date and then use the result for rolling sum to form the grouping:

-- sample data
WITH dataset (ref, from_date, to_date) AS (
    VALUES ('A', date '1997-01-04', date '1998-01-04'),
        ('A', date '1998-01-04', date '1998-05-27'),
        ('A', date '1998-05-27', date '1999-01-04'),
        ('B', date '1997-01-04', date '1998-01-04'),
        ('B', date '1998-01-04', date '1998-07-26'),
        ('B', date '2012-01-04', date '2013-01-04')
)

-- query
select ref,
    min(from_date) from_date,
    max(to_date) to_date,
    max(grp) over(partition by ref) + 1 count_ranges
from(
        select ref,
            from_date,
            to_date,
            sum(if(grp > 0, 1, 0)) over(partition by ref order by from_date, to_date) as grp
        from (
                select *,
                    coalesce(
                        date_diff('day', lag(to_date) over(partition by ref order by from_date, to_date), from_date),
                        0
                    ) as grp
                from dataset
            )
    )
group by ref, grp

Output:

ref from_date to_date count_ranges
A 1997-01-04 1999-01-04 1
B 1997-01-04 1998-07-26 2
B 2012-01-04 2013-01-04 2
Guru Stron
  • 102,774
  • 10
  • 95
  • 132
  • That's great but I can see on issue here (but I might be wrong of course). Does it compare row by row and 'forget' previous timestamps? Because they can overlap. create_timestamp1: 2019-10-08 10:47:54 delete_timestamp1: \N create_timestamp2: 2021-01-22 12:37:21 delete_timestamp2: 2021-07-11 8:27:44 create_timestamp3: 2021-07-19 9:10:39 delete_timestamp3: \N So there is a gap between delete2 and create3 but record 1 was never deleted. The vehicle should be counted as 'Active' but it will be as 'Resurrected' – Kamil Dworzecki Apr 04 '22 at 16:44
  • For calculations I would also add Coalesce(delete_timestamp,now()) to get rid of '\N' records. And the query should somehow 'remember' the highest previous timestamp when comparing to the next row. – Kamil Dworzecki Apr 04 '22 at 16:45
  • @KamilDworzecki I've used the data sample from the linked question and have not seen the update to yours. If this does not work for you I will try to dive deeper tomorrow if I'll have time. – Guru Stron Apr 04 '22 at 17:00
  • How about adding a column moving_max to the main data base? Is it possible to create sth like: Select Max(delete_timestamp) Over (Partition by license_plate order by create_timestamp) as moving_max FROM vehicles WHERE create_timestamp <= lead(create_timestamp) over(partition by license_plate order by create_timestamp, delete_timestamp) as moving_max Your query will work if I had a column like this I hope you understand. Sorry I am chaotic, just thinking loud before going sleep ;) – Kamil Dworzecki Apr 04 '22 at 20:58
  • @KamilDworzecki yes, using `max` over partitioned and ordered can do the trick. – Guru Stron Apr 05 '22 at 19:46
0

Thanks @Guru Stron, you directed me to the right solution!

I upgraded your code by adding one column to my initial database:

select v.*,
       max(delete_timestamp) over (partition by license_plate 
       order by create_timestamp, delete_timestamp ROWS UNBOUNDED PRECEDING) 
       as moving_max

from vehicles v

End then I compare next create_timestamp with previous moving_max instead of previous delete_timestamp

Didn't check with your part of code yet. But if it worked before (not as I'd like but as you intended ;) ) and my updated database looks good with the new column (every license plate has its moving_max calculated correctly) so it can't work wrong.