I have a PostgreSQL database (with PostGIS) and GTFS data (https://developers.google.com/transit/gtfs/reference) from several transit agencies imported. I have identified all potential transfer locations based on proximity, and populated a table with those data. I now want to find travel times between points in the region allowing for up to 2 transfers. I created a view that joins all my tables to make my queries to find travel times easier to read. Here is my view:
CREATE OR REPLACE VIEW trip_planning_data_view AS
SELECT b.agency_id, i.agency_name, h.route_id, h.route_long_name, h.route_short_name, h.route_type,
e.trip_headsign, e.direction_id, a.stop_id AS stop_id_a, c.stop_name AS origin_stop_name, a.arrival_time AS origin_arrival_time,
b.stop_id AS stop_id_b, d.stop_name AS destination_stop_name, b.arrival_time AS destination_arrival_time,
b.arrival_time - a.arrival_time AS travel_time,
g.agency_id_b AS transfer_agency_id, g.stop_id_b AS transfer_stop_id, g.distance_meters AS transfer_distance_meters,
(round(g.distance_meters / 60::double precision)::character varying || ' Minutes'::character varying)::interval AS transfer_time,
b.arrival_time + ((round(g.distance_meters / 60::double precision)::character varying || ' Minutes'::character varying)::interval) AS transfer_arrival_time
FROM stop_time a
JOIN stop_time b ON a.agency_id = b.agency_id AND a.trip_id = b.trip_id AND a.stop_id <> b.stop_id AND a.stop_sequence < b.stop_sequence AND a.arrival_time < b.arrival_time
JOIN stop c ON a.agency_id = c.agency_id AND a.stop_id = c.stop_id
JOIN stop d ON b.agency_id = d.agency_id AND b.stop_id = d.stop_id
JOIN trip e ON a.agency_id = e.agency_id AND a.trip_id = e.trip_id
JOIN calendar f ON e.agency_id = f.agency_id AND e.service_id = f.service_id
LEFT JOIN stop_transfers g ON b.agency_id = g.agency_id_a AND b.stop_id = g.stop_id_a
JOIN route h ON e.agency_id = h.agency_id AND e.route_id = h.route_id
JOIN agency i ON h.agency_id = i.agency_id
WHERE f.monday = true
ORDER BY a.stop_id, b.arrival_time - a.arrival_time;
(I am only interested in Monday trips, and I don't know why, but the ORDER BY clause in the view made a huge performance improvement.)
These tables comply with GTFS file structures with the addition of the stop_transfers table which contains the agency and stop IDs where transfers can be made as well as the distance between the stops.
Querying this view for trips with 1 transfer is pretty quick (generally under 1 second), however queries for 2 transfer trips takes a VERY long time (several minutes). Here is an example of a 2 transfer trip query:
select *
from trip_planning_data_view t0
join trip_planning_data_view t1 on t0.transfer_agency_id = t1.agency_id and t0.transfer_stop_id = t1.stop_id_a
join trip_planning_data_view t2 on t1.transfer_agency_id = t2.agency_id and t1.transfer_stop_id = t2.stop_id_a
where t0.agency_id = '1A'
and t0.stop_id_a = 's101'
and t0.origin_arrival_time between ('08:00:00'::interval) and ('08:00:00'::interval + '30 minutes'::interval )
and t1.origin_arrival_time between (t0.origin_arrival_time + t0.travel_time + t0.transfer_time) and (t0.origin_arrival_time + '30 minutes'::interval + t0.travel_time + t0.transfer_time)
and t2.agency_id = '1A'
and t2.stop_id_b = 's247'
and t2.origin_arrival_time between (t1.origin_arrival_time + t1.travel_time + t1.transfer_time) and (t1.origin_arrival_time + '30 minutes'::interval + t1.travel_time + t1.transfer_time)
Here is the query plan:
Nested Loop (cost=168984.47..203333.30 rows=1 width=651)
-> Nested Loop (cost=168984.47..203324.90 rows=1 width=686)
Join Filter: (((g.stop_id_b)::text = (a.stop_id)::text) AND (a.arrival_time >= ((a.arrival_time + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)) AND (a.arrival_time <= (((a.arrival_time + '00:30:00'::interval) + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)))
-> Nested Loop (cost=0.00..117.22 rows=1 width=252)
Join Filter: ((a.agency_id)::text = (h.agency_id)::text)
-> Nested Loop (cost=0.00..108.94 rows=1 width=216)
-> Nested Loop (cost=0.00..100.65 rows=1 width=220)
Join Filter: ((a.agency_id)::text = (e.agency_id)::text)
-> Nested Loop (cost=0.00..91.92 rows=1 width=198)
-> Nested Loop (cost=0.00..83.50 rows=1 width=161)
Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
-> Nested Loop Left Join (cost=0.00..42.66 rows=1 width=112)
-> Nested Loop (cost=0.00..34.29 rows=1 width=90)
-> Index Scan using st_a_s_idx on stop_time b (cost=0.00..25.88 rows=1 width=53)
Index Cond: (((agency_id)::text = '1A'::text) AND ((stop_id)::text = 's247'::text))
-> Index Scan using a_stop_idx on stop d (cost=0.00..8.40 rows=1 width=44)
Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((stop_id)::text = (b.stop_id)::text))
-> Index Scan using stop_transfers_as_a_idx on stop_transfers g (cost=0.00..8.35 rows=1 width=36)
Index Cond: (((b.agency_id)::text = (agency_id_a)::text) AND ((b.stop_id)::text = (stop_id_a)::text))
-> Index Scan using st_a_t_idx on stop_time a (cost=0.00..40.78 rows=3 width=53)
Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
-> Index Scan using a_stop_idx on stop c (cost=0.00..8.40 rows=1 width=44)
Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
-> Index Scan using trip_id_idx on trip e (cost=0.00..8.71 rows=1 width=80)
Index Cond: ((trip_id)::text = (a.trip_id)::text)
-> Index Scan using a_s_idx on calendar f (cost=0.00..8.28 rows=1 width=20)
Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
Filter: monday
-> Index Scan using route_id_idx on route h (cost=0.00..8.27 rows=1 width=41)
Index Cond: ((route_id)::text = (e.route_id)::text)
-> Nested Loop (cost=168984.47..203207.60 rows=1 width=434)
-> Nested Loop (cost=168984.47..203199.32 rows=1 width=477)
-> Nested Loop (cost=168984.47..203191.04 rows=1 width=520)
Join Filter: (((g.agency_id_b)::text = (b.agency_id)::text) AND ((g.stop_id_b)::text = (a.stop_id)::text) AND (a.arrival_time >= ((a.arrival_time + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)) AND (a.arrival_time <= (((a.arrival_time + '00:30:00'::interval) + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)))
-> Nested Loop (cost=168933.50..178461.70 rows=1 width=260)
-> Nested Loop (cost=168933.50..178453.41 rows=1 width=264)
-> Nested Loop (cost=168933.50..178444.99 rows=1 width=227)
Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
-> Nested Loop (cost=168933.50..178404.16 rows=1 width=236)
Join Filter: ((b.agency_id)::text = (h.agency_id)::text)
-> Nested Loop (cost=168933.50..178387.59 rows=2 width=200)
Join Filter: ((b.agency_id)::text = (e.agency_id)::text)
-> Nested Loop (cost=168933.50..177724.50 rows=76 width=120)
-> Merge Join (cost=168933.50..170942.05 rows=869 width=89)
Merge Cond: (((b.agency_id)::text = (g.agency_id_a)::text) AND ((b.stop_id)::text = (g.stop_id_a)::text))
-> Sort (cost=144224.83..144325.07 rows=40096 width=53)
Sort Key: b.agency_id, b.stop_id
-> Bitmap Heap Scan on stop_time b (cost=1068.60..141159.25 rows=40096 width=53)
Recheck Cond: ((agency_id)::text = '1A'::text)
-> Bitmap Index Scan on st_a_s_idx (cost=0.00..1058.58 rows=40096 width=0)
Index Cond: ((agency_id)::text = '1A'::text)
-> Sort (cost=24708.45..25274.92 rows=226587 width=36)
Sort Key: g.agency_id_a, g.stop_id_a
-> Seq Scan on stop_transfers g (cost=0.00..4553.87 rows=226587 width=36)
-> Index Scan using a_stop_idx on stop d (cost=0.00..7.79 rows=1 width=44)
Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((stop_id)::text = (b.stop_id)::text))
-> Index Scan using trip_id_idx on trip e (cost=0.00..8.71 rows=1 width=80)
Index Cond: ((trip_id)::text = (b.trip_id)::text)
-> Index Scan using route_id_idx on route h (cost=0.00..8.27 rows=1 width=41)
Index Cond: ((route_id)::text = (e.route_id)::text)
-> Index Scan using st_a_t_idx on stop_time a (cost=0.00..40.80 rows=1 width=53)
Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
Filter: ((arrival_time >= '08:11:00'::interval) AND (arrival_time <= '08:30:00'::interval) AND ((stop_id)::text = 's101'::text))
-> Index Scan using a_stop_idx on stop c (cost=0.00..8.40 rows=1 width=44)
Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
-> Index Scan using a_s_idx on calendar f (cost=0.00..8.28 rows=1 width=20)
Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
Filter: monday
-> Nested Loop (cost=50.97..24729.27 rows=1 width=260)
-> Nested Loop (cost=50.97..24720.85 rows=1 width=223)
Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
-> Nested Loop (cost=50.97..24680.01 rows=1 width=232)
-> Nested Loop (cost=50.97..24663.42 rows=2 width=236)
Join Filter: ((b.agency_id)::text = (h.agency_id)::text)
-> Nested Loop (cost=50.97..24447.16 rows=29 width=200)
Join Filter: ((b.agency_id)::text = (e.agency_id)::text)
-> Nested Loop (cost=50.97..15148.58 rows=1096 width=120)
-> Nested Loop (cost=50.97..12475.29 rows=59 width=80)
-> Bitmap Heap Scan on stop_transfers g (cost=50.97..2141.81 rows=1375 width=36)
Recheck Cond: ((agency_id_b)::text = '1A'::text)
-> Bitmap Index Scan on stop_transfers_as_b_idx (cost=0.00..50.63 rows=1375 width=0)
Index Cond: ((agency_id_b)::text = '1A'::text)
-> Index Scan using a_stop_idx on stop d (cost=0.00..7.50 rows=1 width=44)
Index Cond: (((agency_id)::text = (g.agency_id_a)::text) AND ((stop_id)::text = (g.stop_id_a)::text))
-> Index Scan using st_a_s_idx on stop_time b (cost=0.00..45.22 rows=6 width=53)
Index Cond: (((agency_id)::text = (d.agency_id)::text) AND ((stop_id)::text = (d.stop_id)::text))
-> Index Scan using trip_id_idx on trip e (cost=0.00..8.47 rows=1 width=80)
Index Cond: ((trip_id)::text = (b.trip_id)::text)
-> Index Scan using route_id_idx on route h (cost=0.00..7.44 rows=1 width=41)
Index Cond: ((route_id)::text = (e.route_id)::text)
-> Index Scan using a_s_idx on calendar f (cost=0.00..8.28 rows=1 width=20)
Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
Filter: monday
-> Index Scan using st_a_t_idx on stop_time a (cost=0.00..40.78 rows=3 width=53)
Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
-> Index Scan using a_stop_idx on stop c (cost=0.00..8.40 rows=1 width=44)
Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
-> Index Scan using agency_id_idx on agency i (cost=0.00..8.27 rows=1 width=31)
Index Cond: ((agency_id)::text = (a.agency_id)::text)
-> Index Scan using agency_id_idx on agency i (cost=0.00..8.27 rows=1 width=31)
Index Cond: ((agency_id)::text = (a.agency_id)::text)
-> Index Scan using agency_id_idx on agency i (cost=0.00..8.27 rows=1 width=31)
Index Cond: ((agency_id)::text = (a.agency_id)::text)
The query plan appears to be making use of indexes. Any suggestions to optimize this or a better approach would be greatly appreciated. Thanks in advance.