0

I am using a GTFS relational database to find the fastest trip time offered between two given points. The fields I have are trip_id, arrival_time and departure_time (in minutes from midnight), and stop_id

A tranche of the data might be:

trip_id | arrival_time | departure_time | stop_id
1       | 5            | 5              | 90001
1       | 8            | 8              | 90002
1       | 10           | 10             | 90003
1       | 15           | 15             | 90004
2       | 25           | 25             | 90001
2       | 28           | 28             | 90002
2       | 32           | 33             | 90003
2       | 38           | 38             | 90004
3       | 35           | 35             | 90001
3       | 38           | 38             | 90002
3       | 48           | 48             | 90004
4       | 8            | 8              | 90003
4       | 10           | 10             | 90004
4       | 15           | 15             | 90005

I am looking for the shortest time (and associated trip) between the departure from stop 90001 and the arrival at stop 90003. The answer is 5 minutes for trip '1' which beats the 8 minutes for trip 2 and the fact that trip 3 doesn't stop at 90003. Also, the time differences have to be on the same trip the output can't be 3 minutes from a combination of trip 1 and 4.

How do I write an SQL statement to do this? So far I have failed miserably:

SELECT trip_id, arrival_time, departure_time, stop_id, MIN(departure_time-arrival_time)
FROM stop_times
WHERE stop_id IN (90001, 90003)
GROUP BY trip_id

(I have already subset the data so all trips are in the direction I want so I will not have to flip the arrival and departure on the fly. It will always be the time between the departure_time of the first stop_id and the arrival_time of the second.)

gren
  • 17
  • 3

1 Answers1

1

I think this should work using max and case with limit:

select trip_id, 
  max(case when stop_id = 90003 then arrival_time end)
  - max(case when stop_id = 90001 then departure_time end)
from stop_times
where stop_id in (90001,90003)
group by trip_id
order by 1
limit 1

I will note that for trip_id 2, this returns 7 instead of 8 (32-25). But perhaps it should use the departure_time for both case statements.

Community
  • 1
  • 1
sgeddes
  • 62,311
  • 6
  • 61
  • 83
  • It worked on my test case but not on my larger data. When run, it returned a trip_id for a trip that is not on the same line as my second stop and NA for the minutes. Yet, if I remove "limit 1", I get an array of NAs and values (as strings, oddly). If I convert them back to numbers and find the minimum, that value appears to be the proper minimum trip time (and the other values are all plausible trip times). So, your code is great except for that one issue. I'm going to play around with it and see if I can figure out why it isn't working. If you have any ideas, please let me know. Thanks! – gren Nov 26 '14 at 04:51