Introduction
(DB-fiddle here: https://dbfiddle.uk/o5gsnxpt)
Say you manage busses and that you have two tables, bus_cancellations
and bus_timetables
:
The table bus_cancellations
contains all cancellations of bus journeys (only one shown here) where each row represents the part of the timetable that has been cancelled (e.g., the whole timetable = bus is completely cancelled that day | partial cancellation = the bus will drive between certain stops only). Furthermore, the time of when the bus journey was cancelled is captured in the column cancellation_time
.
bus_id | driving_day | cancellation_type | stop_from | stop_to | cancellation_time |
---|---|---|---|---|---|
1 | 2023-05-01 | Partial Start | Stop 1 | Stop 3 | 2023-05-01T12:21:54 |
bus_timetables
: A table containing all the timetable versions that have ever been present for all busses on all days. A bus can have multiple timetables through time as planning might change timetables well before the day that the bus will drive and changes can happen to the timetable of a bus after the bus should have departed (had it been cancelled or not). The time of when the timetable was in effect is captured in the column timetable_creation_time
.
bus_id | driving_day | timetable_version | stop | timetable_stop_entry_position | arrive_time | depart_time | timetable_creation_time |
---|---|---|---|---|---|---|---|
1 | 2023-05-01 | 1 | Stop 1 | 0 | NULL | 13:42 | 2023-03-14T13:43:21 |
1 | 2023-05-01 | 1 | Stop 2 | 1 | 13:54 | 13:56 | 2023-03-14T13:43:21 |
1 | 2023-05-01 | 1 | Stop 3 | 2 | 14:04 | 14:05 | 2023-03-14T13:43:21 |
1 | 2023-05-01 | 1 | Stop 4 | 3 | 14:13 | NULL | 2023-03-14T13:43:21 |
1 | 2023-05-01 | 2 | Stop 1 | 0 | NULL | 13:56 | 2023-05-01T04:42:32 |
1 | 2023-05-01 | 2 | Stop 2 | 1 | 14:08 | 14:10 | 2023-05-01T04:42:32 |
1 | 2023-05-01 | 2 | Stop 3 | 2 | 14:18 | 14:19 | 2023-05-01T04:42:32 |
1 | 2023-05-01 | 2 | Stop 4 | 3 | 14:27 | NULL | 2023-05-01T04:42:32 |
1 | 2023-05-01 | 3 | Stop 1 | 0 | NULL | 14:01 | 2023-05-02T12:43:27 |
1 | 2023-05-01 | 3 | Stop 2 | 1 | 14:13 | 14:15 | 2023-05-02T12:43:27 |
1 | 2023-05-01 | 3 | Stop 3 | 2 | 14:23 | 14:24 | 2023-05-02T12:43:27 |
1 | 2023-05-01 | 3 | Stop 4 | 3 | 14:34 | NULL | 2023-05-02T12:43:27 |
Objective
In the
bus_timetables
table, identify the timetable that was in effect at the time of the bus getting cancelled (bus_cancellations.cancellation_time
andbus_timetables.timetable_creation_time
). As far as I can reason, this would be the most recent timetable among the set of timetables that have a creation time on or before the bus cancellation time.Expand the cancelled part of the timetable (the part of the timetable between
stop_from
andstop_to
inbus_cancellations
) into a row per relevant entry in thebus_timetables
table (so as to get the correct stops information on a stop-per-row level.
Using above example tables, this would result in a table like the following:
bus_id | driving_day | cancellation_type | stop_from | stop_to | cancellation_time | stop | arrive_time | depart_time |
---|---|---|---|---|---|---|---|---|
1 | 2023-05-01 | Partial Start | Stop 1 | Stop 3 | 2023-05-01T12:21:54 | Stop 1 | NULL | 13:56 |
1 | 2023-05-01 | Partial Start | Stop 1 | Stop 3 | 2023-05-01T12:21:54 | Stop 2 | 14:08 | 14:10 |
1 | 2023-05-01 | Partial Start | Stop 1 | Stop 3 | 2023-05-01T12:21:54 | Stop 3 | 14:18 | 14:19 |
Progress so far
For objective 1, in order to retrieve the correct rows (those that are from the timetable that was in effect at the time of the cancellation), I've tried working with lateral join, but so far I haven't been able to get any query working.
For objective 2, I've understood that one way of expanding the stops between the stop_from
and stop_to
values in bus_cancellations
would be to use a correlated subquery (unsure if this is also a suboptimal solution to this part of the query):
SELECT
c.bus_id,
c.driving_day,
stop_from,
stop_to,
stop,
timetable_stop_entry_position,
t.timetable_version,
t.timetable_creation_time
FROM bus_cancellations AS c
LEFT JOIN bus_timetables AS t
ON
c.bus_id = t.bus_id
AND c.driving_day = t.driving_day
WHERE
timetable_stop_entry_position BETWEEN (SELECT timetable_stop_entry_position FROM bus_timetables WHERE stop = c.stop_from LIMIT 1)
AND (SELECT timetable_stop_entry_position FROM bus_timetables WHERE stop = c.stop_to LIMIT 1);
So far, I've been unable to figure out a complete (and performant) solution that solves both objectives.