0

I have a PostgreSQL database (with PostGIS) and GTFS data (https://developers.google.com/transit/gtfs/reference) from several transit agencies imported. I have identified all potential transfer locations based on proximity, and populated a table with those data. I now want to find travel times between points in the region allowing for up to 2 transfers. I created a view that joins all my tables to make my queries to find travel times easier to read. Here is my view:

CREATE OR REPLACE VIEW trip_planning_data_view AS 
 SELECT b.agency_id, i.agency_name, h.route_id, h.route_long_name, h.route_short_name, h.route_type, 
    e.trip_headsign, e.direction_id, a.stop_id AS stop_id_a, c.stop_name AS origin_stop_name, a.arrival_time AS origin_arrival_time, 
    b.stop_id AS stop_id_b, d.stop_name AS destination_stop_name, b.arrival_time AS destination_arrival_time, 
    b.arrival_time - a.arrival_time AS travel_time, 
    g.agency_id_b AS transfer_agency_id, g.stop_id_b AS transfer_stop_id, g.distance_meters AS transfer_distance_meters, 
    (round(g.distance_meters / 60::double precision)::character varying || ' Minutes'::character varying)::interval AS transfer_time, 
    b.arrival_time + ((round(g.distance_meters / 60::double precision)::character varying || ' Minutes'::character varying)::interval) AS transfer_arrival_time
   FROM stop_time a
   JOIN stop_time b ON a.agency_id = b.agency_id AND a.trip_id = b.trip_id AND a.stop_id <> b.stop_id AND a.stop_sequence < b.stop_sequence AND a.arrival_time < b.arrival_time
   JOIN stop c ON a.agency_id = c.agency_id AND a.stop_id = c.stop_id
   JOIN stop d ON b.agency_id = d.agency_id AND b.stop_id = d.stop_id
   JOIN trip e ON a.agency_id = e.agency_id AND a.trip_id = e.trip_id
   JOIN calendar f ON e.agency_id = f.agency_id AND e.service_id = f.service_id
   LEFT JOIN stop_transfers g ON b.agency_id = g.agency_id_a AND b.stop_id = g.stop_id_a
   JOIN route h ON e.agency_id = h.agency_id AND e.route_id = h.route_id
   JOIN agency i ON h.agency_id = i.agency_id
  WHERE f.monday = true
  ORDER BY a.stop_id, b.arrival_time - a.arrival_time;

(I am only interested in Monday trips, and I don't know why, but the ORDER BY clause in the view made a huge performance improvement.)

These tables comply with GTFS file structures with the addition of the stop_transfers table which contains the agency and stop IDs where transfers can be made as well as the distance between the stops.

Querying this view for trips with 1 transfer is pretty quick (generally under 1 second), however queries for 2 transfer trips takes a VERY long time (several minutes). Here is an example of a 2 transfer trip query:

select *
from trip_planning_data_view t0 
join trip_planning_data_view t1 on t0.transfer_agency_id = t1.agency_id and t0.transfer_stop_id = t1.stop_id_a 
join trip_planning_data_view t2 on t1.transfer_agency_id = t2.agency_id and t1.transfer_stop_id = t2.stop_id_a 
where t0.agency_id = '1A' 
and t0.stop_id_a = 's101' 
and t0.origin_arrival_time between ('08:00:00'::interval) and ('08:00:00'::interval + '30 minutes'::interval )
and t1.origin_arrival_time between (t0.origin_arrival_time + t0.travel_time + t0.transfer_time) and (t0.origin_arrival_time + '30 minutes'::interval + t0.travel_time + t0.transfer_time) 
and t2.agency_id = '1A' 
and t2.stop_id_b = 's247' 
and t2.origin_arrival_time between (t1.origin_arrival_time + t1.travel_time + t1.transfer_time) and (t1.origin_arrival_time + '30 minutes'::interval + t1.travel_time + t1.transfer_time) 

Here is the query plan:

Nested Loop  (cost=168984.47..203333.30 rows=1 width=651)
  ->  Nested Loop  (cost=168984.47..203324.90 rows=1 width=686)
        Join Filter: (((g.stop_id_b)::text = (a.stop_id)::text) AND (a.arrival_time >= ((a.arrival_time + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)) AND (a.arrival_time <= (((a.arrival_time + '00:30:00'::interval) + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)))
        ->  Nested Loop  (cost=0.00..117.22 rows=1 width=252)
              Join Filter: ((a.agency_id)::text = (h.agency_id)::text)
              ->  Nested Loop  (cost=0.00..108.94 rows=1 width=216)
                    ->  Nested Loop  (cost=0.00..100.65 rows=1 width=220)
                          Join Filter: ((a.agency_id)::text = (e.agency_id)::text)
                          ->  Nested Loop  (cost=0.00..91.92 rows=1 width=198)
                                ->  Nested Loop  (cost=0.00..83.50 rows=1 width=161)
                                      Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
                                      ->  Nested Loop Left Join  (cost=0.00..42.66 rows=1 width=112)
                                            ->  Nested Loop  (cost=0.00..34.29 rows=1 width=90)
                                                  ->  Index Scan using st_a_s_idx on stop_time b  (cost=0.00..25.88 rows=1 width=53)
                                                        Index Cond: (((agency_id)::text = '1A'::text) AND ((stop_id)::text = 's247'::text))
                                                  ->  Index Scan using a_stop_idx on stop d  (cost=0.00..8.40 rows=1 width=44)
                                                        Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((stop_id)::text = (b.stop_id)::text))
                                            ->  Index Scan using stop_transfers_as_a_idx on stop_transfers g  (cost=0.00..8.35 rows=1 width=36)
                                                  Index Cond: (((b.agency_id)::text = (agency_id_a)::text) AND ((b.stop_id)::text = (stop_id_a)::text))
                                      ->  Index Scan using st_a_t_idx on stop_time a  (cost=0.00..40.78 rows=3 width=53)
                                            Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
                                ->  Index Scan using a_stop_idx on stop c  (cost=0.00..8.40 rows=1 width=44)
                                      Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
                          ->  Index Scan using trip_id_idx on trip e  (cost=0.00..8.71 rows=1 width=80)
                                Index Cond: ((trip_id)::text = (a.trip_id)::text)
                    ->  Index Scan using a_s_idx on calendar f  (cost=0.00..8.28 rows=1 width=20)
                          Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
                          Filter: monday
              ->  Index Scan using route_id_idx on route h  (cost=0.00..8.27 rows=1 width=41)
                    Index Cond: ((route_id)::text = (e.route_id)::text)
        ->  Nested Loop  (cost=168984.47..203207.60 rows=1 width=434)
              ->  Nested Loop  (cost=168984.47..203199.32 rows=1 width=477)
                    ->  Nested Loop  (cost=168984.47..203191.04 rows=1 width=520)
                          Join Filter: (((g.agency_id_b)::text = (b.agency_id)::text) AND ((g.stop_id_b)::text = (a.stop_id)::text) AND (a.arrival_time >= ((a.arrival_time + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)) AND (a.arrival_time <= (((a.arrival_time + '00:30:00'::interval) + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)))
                          ->  Nested Loop  (cost=168933.50..178461.70 rows=1 width=260)
                                ->  Nested Loop  (cost=168933.50..178453.41 rows=1 width=264)
                                      ->  Nested Loop  (cost=168933.50..178444.99 rows=1 width=227)
                                            Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
                                            ->  Nested Loop  (cost=168933.50..178404.16 rows=1 width=236)
                                                  Join Filter: ((b.agency_id)::text = (h.agency_id)::text)
                                                  ->  Nested Loop  (cost=168933.50..178387.59 rows=2 width=200)
                                                        Join Filter: ((b.agency_id)::text = (e.agency_id)::text)
                                                        ->  Nested Loop  (cost=168933.50..177724.50 rows=76 width=120)
                                                              ->  Merge Join  (cost=168933.50..170942.05 rows=869 width=89)
                                                                    Merge Cond: (((b.agency_id)::text = (g.agency_id_a)::text) AND ((b.stop_id)::text = (g.stop_id_a)::text))
                                                                    ->  Sort  (cost=144224.83..144325.07 rows=40096 width=53)
                                                                          Sort Key: b.agency_id, b.stop_id
                                                                          ->  Bitmap Heap Scan on stop_time b  (cost=1068.60..141159.25 rows=40096 width=53)
                                                                                Recheck Cond: ((agency_id)::text = '1A'::text)
                                                                                ->  Bitmap Index Scan on st_a_s_idx  (cost=0.00..1058.58 rows=40096 width=0)
                                                                                      Index Cond: ((agency_id)::text = '1A'::text)
                                                                    ->  Sort  (cost=24708.45..25274.92 rows=226587 width=36)
                                                                          Sort Key: g.agency_id_a, g.stop_id_a
                                                                          ->  Seq Scan on stop_transfers g  (cost=0.00..4553.87 rows=226587 width=36)
                                                              ->  Index Scan using a_stop_idx on stop d  (cost=0.00..7.79 rows=1 width=44)
                                                                    Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((stop_id)::text = (b.stop_id)::text))
                                                        ->  Index Scan using trip_id_idx on trip e  (cost=0.00..8.71 rows=1 width=80)
                                                              Index Cond: ((trip_id)::text = (b.trip_id)::text)
                                                  ->  Index Scan using route_id_idx on route h  (cost=0.00..8.27 rows=1 width=41)
                                                        Index Cond: ((route_id)::text = (e.route_id)::text)
                                            ->  Index Scan using st_a_t_idx on stop_time a  (cost=0.00..40.80 rows=1 width=53)
                                                  Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
                                                  Filter: ((arrival_time >= '08:11:00'::interval) AND (arrival_time <= '08:30:00'::interval) AND ((stop_id)::text = 's101'::text))
                                      ->  Index Scan using a_stop_idx on stop c  (cost=0.00..8.40 rows=1 width=44)
                                            Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
                                ->  Index Scan using a_s_idx on calendar f  (cost=0.00..8.28 rows=1 width=20)
                                      Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
                                      Filter: monday
                          ->  Nested Loop  (cost=50.97..24729.27 rows=1 width=260)
                                ->  Nested Loop  (cost=50.97..24720.85 rows=1 width=223)
                                      Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
                                      ->  Nested Loop  (cost=50.97..24680.01 rows=1 width=232)
                                            ->  Nested Loop  (cost=50.97..24663.42 rows=2 width=236)
                                                  Join Filter: ((b.agency_id)::text = (h.agency_id)::text)
                                                  ->  Nested Loop  (cost=50.97..24447.16 rows=29 width=200)
                                                        Join Filter: ((b.agency_id)::text = (e.agency_id)::text)
                                                        ->  Nested Loop  (cost=50.97..15148.58 rows=1096 width=120)
                                                              ->  Nested Loop  (cost=50.97..12475.29 rows=59 width=80)
                                                                    ->  Bitmap Heap Scan on stop_transfers g  (cost=50.97..2141.81 rows=1375 width=36)
                                                                          Recheck Cond: ((agency_id_b)::text = '1A'::text)
                                                                          ->  Bitmap Index Scan on stop_transfers_as_b_idx  (cost=0.00..50.63 rows=1375 width=0)
                                                                                Index Cond: ((agency_id_b)::text = '1A'::text)
                                                                    ->  Index Scan using a_stop_idx on stop d  (cost=0.00..7.50 rows=1 width=44)
                                                                          Index Cond: (((agency_id)::text = (g.agency_id_a)::text) AND ((stop_id)::text = (g.stop_id_a)::text))
                                                              ->  Index Scan using st_a_s_idx on stop_time b  (cost=0.00..45.22 rows=6 width=53)
                                                                    Index Cond: (((agency_id)::text = (d.agency_id)::text) AND ((stop_id)::text = (d.stop_id)::text))
                                                        ->  Index Scan using trip_id_idx on trip e  (cost=0.00..8.47 rows=1 width=80)
                                                              Index Cond: ((trip_id)::text = (b.trip_id)::text)
                                                  ->  Index Scan using route_id_idx on route h  (cost=0.00..7.44 rows=1 width=41)
                                                        Index Cond: ((route_id)::text = (e.route_id)::text)
                                            ->  Index Scan using a_s_idx on calendar f  (cost=0.00..8.28 rows=1 width=20)
                                                  Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
                                                  Filter: monday
                                      ->  Index Scan using st_a_t_idx on stop_time a  (cost=0.00..40.78 rows=3 width=53)
                                            Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
                                ->  Index Scan using a_stop_idx on stop c  (cost=0.00..8.40 rows=1 width=44)
                                      Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
                    ->  Index Scan using agency_id_idx on agency i  (cost=0.00..8.27 rows=1 width=31)
                          Index Cond: ((agency_id)::text = (a.agency_id)::text)
              ->  Index Scan using agency_id_idx on agency i  (cost=0.00..8.27 rows=1 width=31)
                    Index Cond: ((agency_id)::text = (a.agency_id)::text)
  ->  Index Scan using agency_id_idx on agency i  (cost=0.00..8.27 rows=1 width=31)
        Index Cond: ((agency_id)::text = (a.agency_id)::text)

The query plan appears to be making use of indexes. Any suggestions to optimize this or a better approach would be greatly appreciated. Thanks in advance.

Paul
  • 63
  • 5

1 Answers1

0

I think you might be better off using a tool like OpenTripPlanner (http://www.opentripplanner.org/), which is an open-source transit routing engine that works with GTFS. It can be used to quickly and efficiently answer questions all sorts of routing queries, including "fastest time between two stops allowing N transfers".

Optionally, if the agency shares their data with Google (chances are good - http://www.google.com/landing/transit/cities/index.html), then you can use the Google directions API (https://developers.google.com/maps/documentation/directions/) to query transit routing for your two input locations.

Brian Ferris
  • 7,557
  • 5
  • 25
  • 27