1

I am working on a project with sample data for vehicle report relays the data is as follows: (timetransmittedtz is a timestamp)

device_uuid vin jurisdiction odometer timetransmitted_tz readingtype
00012fg5-0b35-123c-456c-789b1d344f 3LSDHTXR2RN289414 Quebec 195195.6 12/1/2022 0:03 Start of Day
00012fg5-0b35-123c-456c-789b1d344f 3LSDHTXR2RN289414 Quebec 195390.7 12/1/2022 22:37 End of Day
00012fg5-0b35-123c-456c-789b1d344f 3LSDHTXR2RN289414 Quebec 198588.9 12/28/2022 0:56 Start of Day
00012fg5-0b35-123c-456c-789b1d344f 3LSDHTXR2RN289414 Ontario 198745.5 12/28/2022 12:21 Change of State
00012fg5-0b35-123c-456c-789b1d344f 3LSDHTXR2RN289414 Quebec 199022.2 12/28/2022 17:07 Change of State
00012fg5-0b35-123c-456c-789b1d344f 3LSDHTXR2RN289414 Quebec 199090.9 12/28/2022 22:13 End of Day

I am trying to create an SQL query for AWS Redshift to get this desired output from the above data, keep in mind there can be multiple change of state events throughout the day for any particular device.

device_uuid vin jurisdiction start_date end_date start_odometer end_odometer
00012fg5-0b35-123c-456c-789b1d344f 3LSDHTXR2RN289414 Quebec 12/1/2022 0:03 12/1/2022 22:37 195195.6 195390.7
00012fg5-0b35-123c-456c-789b1d344f 3LSDHTXR2RN289414 Quebec 12/28/2022 0:56 12/28/2022 12:21 198588.9 198745.5
00012fg5-0b35-123c-456c-789b1d344f 3LSDHTXR2RN289414 Ontario 12/28/2022 12:21 12/28/2022 17:07 198745.5 199022.2
00012fg5-0b35-123c-456c-789b1d344f 3LSDHTXR2RN289414 Quebec 12/28/2022 17:07 12/28/2022 22:13 199022.2 199090.9

I tried many, many versions of sql queries and I am starting to feel like this is impossible.

Here is one query:

WITH start_end_day AS (
  SELECT
    device_uuid,
    MIN(CASE WHEN readingtype = 'Start of Day' THEN timetransmitted_tz END) AS start_date,
    MIN(CASE WHEN readingtype = 'Start of Day' THEN odometer END) AS start_odometer,
    MAX(CASE WHEN readingtype = 'End of Day' THEN timetransmitted_tz END) AS end_date,
    MAX(CASE WHEN readingtype = 'End of Day' THEN odometer END) AS end_odometer,
    TRUNC(timetransmitted_tz) AS day
  FROM "prodenv"."public"."telemetry_jurisdictionchange"
  WHERE readingtype IN ('Start of Day','End of Day')
  GROUP BY device_uuid, TRUNC(timetransmitted_tz)
)
SELECT
  A.vin,
  A.jurisdiction,
  B.start_date,
  COALESCE(MAX(CASE WHEN A.readingtype = 'Change of State' THEN A.timetransmitted_tz END), B.end_date) AS end_date,
  B.start_odometer,
  MAX(CASE WHEN A.readingtype = 'Change of State' THEN odometer END) AS end_odometer,
  A.device_uuid,
  A.client_uuid
FROM "prodenv"."public"."telemetry_jurisdictionchange" A
INNER JOIN start_end_day B
  ON A.device_uuid = B.device_uuid AND TRUNC(A.timetransmitted_tz) = B.day
WHERE A.readingtype IN ('Start of Day', 'Change of State')
GROUP BY A.vin, A.jurisdiction, B.start_date, B.start_odometer, B.end_odometer, A.device_uuid, A.client_uuid, B.end_date
ORDER BY A.device_uuid, A.vin, B.start_date;

This query does not give me the desired output, but it is the closest I have gotten. Just the end_date values are NULL (should not be null) and for devcies that have 'Change of State' events it's not null.

Perhapse I dont have to focus on the readingtype and just focus on squence of events per-day? This feels impossible, if so why?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • Thank you for providing the table examples. Would you mind, in a mix of plain English and pseudocode, describe precisely what you are trying to accomplish? Not how, just want the end result should be. – ClearlyClueless Feb 06 '23 at 04:32

1 Answers1

0

This seems to call for LAG() window function. Anytime you want to grab data from another row think window functions.

Here's a solution for your test data. Set up test table:

create table test (
  device_uuid varchar(64),  
  vin varchar(32),  
  jurisdiction varchar(20), 
  odometer decimal(15,1),   
  timetransmitted_tz timestamp, 
  readingtype varchar(16)
);

insert into test values 
('00012fg5-0b35-123c-456c-789b1d344f', '3LSDHTXR2RN289414', 'Quebec', 195195.6, '12/1/2022 0:03', 'Start of Day'),
('00012fg5-0b35-123c-456c-789b1d344f', '3LSDHTXR2RN289414', 'Quebec', 195390.7, '12/1/2022 22:37', 'End of Day'),
('00012fg5-0b35-123c-456c-789b1d344f', '3LSDHTXR2RN289414', 'Quebec', 198588.9, '12/28/2022 0:56', 'Start of Day'),
('00012fg5-0b35-123c-456c-789b1d344f', '3LSDHTXR2RN289414', 'Ontario', 198745.5, '12/28/2022 12:21', 'Change of State'),
('00012fg5-0b35-123c-456c-789b1d344f', '3LSDHTXR2RN289414', 'Quebec', 199022.2, '12/28/2022 17:07', 'Change of State'),
('00012fg5-0b35-123c-456c-789b1d344f', '3LSDHTXR2RN289414', 'Quebec', 199090.9, '12/28/2022 22:13', 'End of Day');

Query:

with split as (
  select device_uuid, vin, jurisdiction, odometer, 
        timetransmitted_tz, readingtype,
        case when readingtype = 'Start of Day' 
            then timetransmitted_tz 
            else lag(timetransmitted_tz) 
              over (partition by device_uuid 
                    order by timetransmitted_tz) end as start_date,
        case when readingtype = 'Start of Day' 
            then odometer 
            else lag(odometer)
              over (partition by device_uuid 
                    order by timetransmitted_tz) end as start_odometer,
        case when readingtype = 'Change of State' 
            then timetransmitted_tz
            when readingtype = 'End of Day' 
            then timetransmitted_tz end as end_date,
        case when readingtype = 'Change of State' 
            then odometer
            when readingtype = 'End of Day'
            then odometer end as end_odometer
  from test)
select device_uuid, vin, jurisdiction, 
        start_date, 
        end_date, 
        start_odometer, 
        end_odometer
from split
where readingtype <> 'Start of Day';

No grouping or joining. Just grab the preceding timestamp as start_time and then eliminate the readingtype of 'Start of Day' as you only want rows that define end times. I carried along the date and odometer columns fully as I though you might need them for your real world query. If not then all the case statements can be eliminated and would look like this:

with split as (
  select device_uuid, vin, jurisdiction, odometer, 
        timetransmitted_tz, readingtype,
        lag(timetransmitted_tz) over (partition by device_uuid 
                    order by timetransmitted_tz) as start_date,
        lag(odometer) over (partition by device_uuid 
                    order by timetransmitted_tz) as start_odometer,
        timetransmitted_tz as end_date,
        odometer as end_odometer
  from test)
select device_uuid, vin, jurisdiction, 
        start_date, 
        end_date, 
        start_odometer, 
        end_odometer
from split
where readingtype <> 'Start of Day';
Bill Weiner
  • 8,835
  • 2
  • 7
  • 18