I am working on a project with sample data for vehicle report relays the data is as follows: (timetransmittedtz
is a timestamp)
device_uuid | vin | jurisdiction | odometer | timetransmitted_tz | readingtype |
---|---|---|---|---|---|
00012fg5-0b35-123c-456c-789b1d344f | 3LSDHTXR2RN289414 | Quebec | 195195.6 | 12/1/2022 0:03 | Start of Day |
00012fg5-0b35-123c-456c-789b1d344f | 3LSDHTXR2RN289414 | Quebec | 195390.7 | 12/1/2022 22:37 | End of Day |
00012fg5-0b35-123c-456c-789b1d344f | 3LSDHTXR2RN289414 | Quebec | 198588.9 | 12/28/2022 0:56 | Start of Day |
00012fg5-0b35-123c-456c-789b1d344f | 3LSDHTXR2RN289414 | Ontario | 198745.5 | 12/28/2022 12:21 | Change of State |
00012fg5-0b35-123c-456c-789b1d344f | 3LSDHTXR2RN289414 | Quebec | 199022.2 | 12/28/2022 17:07 | Change of State |
00012fg5-0b35-123c-456c-789b1d344f | 3LSDHTXR2RN289414 | Quebec | 199090.9 | 12/28/2022 22:13 | End of Day |
I am trying to create an SQL query for AWS Redshift to get this desired output from the above data, keep in mind there can be multiple change of state events throughout the day for any particular device.
device_uuid | vin | jurisdiction | start_date | end_date | start_odometer | end_odometer |
---|---|---|---|---|---|---|
00012fg5-0b35-123c-456c-789b1d344f | 3LSDHTXR2RN289414 | Quebec | 12/1/2022 0:03 | 12/1/2022 22:37 | 195195.6 | 195390.7 |
00012fg5-0b35-123c-456c-789b1d344f | 3LSDHTXR2RN289414 | Quebec | 12/28/2022 0:56 | 12/28/2022 12:21 | 198588.9 | 198745.5 |
00012fg5-0b35-123c-456c-789b1d344f | 3LSDHTXR2RN289414 | Ontario | 12/28/2022 12:21 | 12/28/2022 17:07 | 198745.5 | 199022.2 |
00012fg5-0b35-123c-456c-789b1d344f | 3LSDHTXR2RN289414 | Quebec | 12/28/2022 17:07 | 12/28/2022 22:13 | 199022.2 | 199090.9 |
I tried many, many versions of sql queries and I am starting to feel like this is impossible.
Here is one query:
WITH start_end_day AS (
SELECT
device_uuid,
MIN(CASE WHEN readingtype = 'Start of Day' THEN timetransmitted_tz END) AS start_date,
MIN(CASE WHEN readingtype = 'Start of Day' THEN odometer END) AS start_odometer,
MAX(CASE WHEN readingtype = 'End of Day' THEN timetransmitted_tz END) AS end_date,
MAX(CASE WHEN readingtype = 'End of Day' THEN odometer END) AS end_odometer,
TRUNC(timetransmitted_tz) AS day
FROM "prodenv"."public"."telemetry_jurisdictionchange"
WHERE readingtype IN ('Start of Day','End of Day')
GROUP BY device_uuid, TRUNC(timetransmitted_tz)
)
SELECT
A.vin,
A.jurisdiction,
B.start_date,
COALESCE(MAX(CASE WHEN A.readingtype = 'Change of State' THEN A.timetransmitted_tz END), B.end_date) AS end_date,
B.start_odometer,
MAX(CASE WHEN A.readingtype = 'Change of State' THEN odometer END) AS end_odometer,
A.device_uuid,
A.client_uuid
FROM "prodenv"."public"."telemetry_jurisdictionchange" A
INNER JOIN start_end_day B
ON A.device_uuid = B.device_uuid AND TRUNC(A.timetransmitted_tz) = B.day
WHERE A.readingtype IN ('Start of Day', 'Change of State')
GROUP BY A.vin, A.jurisdiction, B.start_date, B.start_odometer, B.end_odometer, A.device_uuid, A.client_uuid, B.end_date
ORDER BY A.device_uuid, A.vin, B.start_date;
This query does not give me the desired output, but it is the closest I have gotten. Just the end_date values are NULL (should not be null) and for devcies that have 'Change of State' events it's not null.
Perhapse I dont have to focus on the readingtype and just focus on squence of events per-day? This feels impossible, if so why?