3

I am using a GTFS feed for an app I am working on. I am attempting to list all of the stops for a chosen route. Currently, I am attempting to order the list by stop_sequence, but this is not working properly since some trips do not go to every stop and the data I have received increments the stop_sequence by 1 per stop per trip. The significance of this is that the stop_sequence does not account for other trips that might have more or less stops.

Here's an example:

This is the order of the stops for a route, (ignoring the fact that not every trip will stop at each stop)

Stop A
Stop B
Stop C
Stop D
Stop E

Now here are some example trips for the route:

Trip 1: A, B, C, D
Trip 2: A, B, E

What my data is doing:

For Trip 1:

Stop A: stop_sequence = 1
Stop B: stop_sequence = 2
Stop C: stop_sequence = 3
Stop D: stop_sequence = 4

For Trip 2:

Stop A: stop_sequence = 1
Stop B: stop_sequence = 2
Stop E: stop_sequence = 3

So when I try to order all potential stops for a route I end up with this:

Stop A
Stop B
Stop C
Stop E
Stop D

which clearly is incorrect.

Does anyone know of any other potential ideas to correctly order the stops, perhaps using other data that comes with the GTFS Feed?

UPDATED with a real world example

Here is the example output of a database query that gets all of the stops for route 915. This is for the AM schedule.

+---------+---------+---------------+------------------------------------------------+
| stop_id | trip_id | stop_sequence | stop_name                                      |
+---------+---------+---------------+------------------------------------------------+
| 11771   | 1269287 |             1 | LOTTE PLAZA US 40 & US 29                      |
| 11772   | 1269280 |             1 | HARPER'S FARM RD & CEDAR LA eb                 |
| 11773   | 1269280 |             2 | LITTLE PATUXENT & GRAY STAR wb                 |
| 11774   | 1269280 |             3 | LITTLE PATUXENT & WHITE CORD WAY wb            |
| 11775   | 1269280 |             4 | LITTLE PATUXENT & BRIGHT PASSAGE eb            |
| 11776   | 1269280 |             5 | LITTLE PATUXENT & HICKORY RID nb               |
| 11777   | 1269280 |             6 | LITTLE PATUXENT & CEDAR LA eb                  |
| 11778   | 1269280 |             7 | LITTLE PATUXENT & HARPER'S FARM opp eb         |
| 11779   | 1269280 |             8 | COLUMBIA MALL & SOUTH RING RD eb               |
| 11782   | 1269280 |             9 | BROKEN LAND & HICKORY RIDGE sb                 |
| 11780   | 1269289 |             9 | LITTLE PATUXENT & GOV WARFIELD nb              |
| 11783   | 1269280 |            10 | BROKEN LAND PARK & RIDE                        |
| 11781   | 1269289 |            10 | LITTLE PATUXENT & VANTAGE PT nb                |
| 11784   | 1269280 |            11 | SCAGGSVILLE PARK & RIDE                        |
| 11785   | 1269280 |            12 | BURTONSVILLE PARK & RIDE                       |
| 11786   | 1269280 |            13 | COLESVILLE RD  & FENTON ST sb                  |
| 11787   | 1269280 |            14 | SILVER SPRING METRO STATION                    |
| 11788   | 1269280 |            15 | WALTER REED HOSP & 16TH ST NW                  |
| 11789   | 1269280 |            16 | 16TH ST & P ST NW                              |
| 11790   | 1269280 |            17 | 16TH ST & M ST NW                              |
| 11718   | 1269280 |            18 | K ST & 16TH ST NW fs eb                        |
| 11719   | 1269280 |            19 | K ST & 14TH ST NW eb                           |
| 11791   | 1269280 |            20 | 13TH ST & H ST NW sb                           |
| 11759   | 1269280 |            21 | PENNSYLVANIA AVE & 12TH ST NW eb               |
| 11793   | 1269280 |            22 | CONSTITUTION AVE & 10TH ST NW fs eb            |
| 12046   | 1269280 |            23 | 7TH ST NW & CONSTITUTION AVE eb                |
| 11650   | 1269280 |            24 | INDEPENDENCE AVE & 7/6 ST SW mid eb            |
| 11601   | 1269280 |            25 | INDEPENDENCE AVE & 4TH/3RD ST SW eb            |
| 13627   | 1269280 |            26 | M ST & 1st ST SE (NAVY YARD) sb                |
| 13628   | 1269280 |            27 | M ST & 4th ST SE (SOUTHEAST FEDERAL CENTER) eb |
| 11569   | 1269280 |            28 | M ST & ISAAC HALL AVE SE eb                    |
| 11795   | 1269280 |            29 | M ST & 8/9TH STS mid eb                        |
+---------+---------+---------------+------------------------------------------------+

and here is the link to the pdf of the schedule that a lot of commuters are currently using. The first instance of where the two lists differ is after "COLUMBIA MALL & SOUTH RING RD eb"

http://mta.maryland.gov/sites/default/files/915May2011B.pdf

I am trying to make this app commuter friendly as possible, but when the stops are out of order when compared to what commuters usually use, it might cause a lot of confusion.

UPDATE 2:

I still do not see how topological sorting can be used to get the correct sequence. Yes it might give a valid sequence, but it is not guaranteed to be the correct sequence that a commuter will easily recognize. Let's look at another example using the pdf I provided. We will look at Trips 1 and 5 and up until the stop "Columbia Mall". I would create the following edges:

Edges created from Trip 1

Cedar Lane --> Gray Star Way
Gray Star Way --> White Cord Way
...
Harpers Farm Rd --> Columbia Mall

Edges created from Trip 5

Lotte Plaza --> Columbia Mall

The only thing that a topological sorting ensures is

for every directed edge uv from vertex u to vertex v, u comes before v in the ordering

That means that there are multiple valid orderings, but only one is the actual correct one that I want(but there is no way for me to progromatically choose this one over other valid orderings, at least not that I can think of).

A valid ordering might be (this is also the correct one):

Lotte Plaza,
Cedar Lane
Gray Star
...
Columbia Mall

or even

Cedar Lane
Gray Star
...
Lotte Plaza
Columbia Mall

As you can see, according to a topological sort, both of these are valid, but only one of them is the one I want. I cannot think of a way to consistently choose the correct sequence based upon the data provided by the GTFS feed.

Please let me know if I am looking at this the wrong way.

btse
  • 7,811
  • 3
  • 25
  • 30
  • Remember that stop_sequence can only be used to order stops belonging to the same trip (aka rows with the same trip_id in stop_times.txt). As I mentioned below, to create a more general ordering of stops across different trips (aka different sequences of stops), you'd need to use the method I outlined below. – Brian Ferris Jul 07 '13 at 19:24
  • I added another update with why I cannot imagine how a topological sort will achieve what I want. – btse Jul 07 '13 at 20:22
  • Maybe this diagram will help: https://docs.google.com/file/d/0B2T8yNIP0VUQeG1fUVY2X25jcGs/edit I agree that the Lotte Plaza stop is tricky, regarding how to order it relative to the Harper's Farm variation. I've updated my answer below to describe how to forks, branches, and other variations in a natural way. Definitely take a look at the OBA code, as it already handles this case. – Brian Ferris Jul 07 '13 at 22:17
  • Thank you for all the help. I will take a long hard look at the OBA code and try to solve this problem myself before bothering you anymore! – btse Jul 07 '13 at 22:42
  • @btse Did you ever figure this out? I'm about to give up on my GTFS pet project for this same reason. I can't figure out a way to list all stops for all branches! – Julian Feb 23 '15 at 21:42
  • @Julian No unfortunately! Around the time I got stuck on this problem I got a job at NASA and I haven't had a chance to work on my app since then. – btse Feb 24 '15 at 01:05
  • 1
    @btse holy crap, NASA! Congrats! Anyways, my last resort (just in case anyone reads this) is listing every single trip for every route and gathering the results. So far it's been working great though it takes a couple of minutes to run through an 700Mb GTFS database. – Julian Feb 24 '15 at 04:23
  • I've been struggling like 6 days, wrote like 6 alg but none of them worked with all the current GTFSs. I hard grouping (direction/stops/stopcount) first and then trying to merge every type of "route lines" into one single by using angles and distances and pattern matching... but there is always an exception in one of the GTFS where the structure is arbitrary. The main problem is that one route can use the same stop (multiple times) while heading the opposite direction... In real life, you cant do that, how do you know which bus to get on to get the correct direction where you act heading!? – Eric Liu Oct 07 '17 at 13:06
  • * using the same stop on the same route with the same headsign, like lasso trips... and the funny part is when you have branches on it – Eric Liu Oct 07 '17 at 13:18

3 Answers3

3

You could construct a directed graph (DAG) where each stop belonging to a route is a node and each transition between two stops in a trip is an edge. Then, you could perform a topological sorting of the graph (http://en.wikipedia.org/wiki/Topological_sorting) to get an ordering of the stops. Note that topological sorting only works for graphs that have no cycles, but some trips do in fact have cycles, so you would not want to add an edge if it created a cycle.

This happens to be the algorithm used by the OneBusAway application suite for ordering stops: https://github.com/OneBusAway/onebusaway-application-modules/blob/master/onebusaway-transit-data-federation/src/main/java/org/onebusaway/transit_data_federation/impl/beans/RouteBeanServiceImpl.java#L281

Note that sometimes routes will have forks or branches, where there are two sets of stops (one for each branch) that don't interact with each other. A naive topological sort might arbitrarily interleave these stops, but the OBA code uses the following two heuristics to get a more natural ordering:

1) Group stops in the same branch together.

2) When ordering two branches relative to each other, put the branch closer in distance to the branch point first.

Brian Ferris
  • 7,557
  • 5
  • 25
  • 27
  • I will try this and let you know how it goes. It's such a pain that all of this extra work is required when the people creating the GTFS feed could have just used the stop_sequence field more intelligently. – btse Jul 06 '13 at 08:03
  • I am not to sure if this will work since there only exists data on each transition from stop to stop if it's from the same trip. Therefore, there is no way for me to properly draw edges between stops from different trips. To relate this back to my example, there is no way for me to determine that `Stop C` should actually come before `Stop E` (at least not that I can figure out). – btse Jul 07 '13 at 17:49
  • And why exactly should stop C come before stop E (English alphabet aside)? Perhaps your route has a branch at the end, with half the buses going to C-D and the other half going to E. In such a situation is there really a "proper" ordering of stops? – Brian Ferris Jul 07 '13 at 18:37
  • haha sorry Brian! I was not trying to be disrespectful with my post on Google groups. I am going to update my original post with another example since I cannot respond properly in this comment. – btse Jul 07 '13 at 18:55
0

For anyone that comes across this question this is how I solved the problem some 'years' ago.

There is no one correct sequence - the goal here was to produce a 'visually optimal' sequence (in the majority of cases). Rather then looking at the individual stops - I've grouped stops together into logical sections and then merged those sections together in a not too dissimilar process to topological sorting.

You can then add additional rules/weighting's to unrelated sections to then determine which section should take precedence over another. e.g. ABC --->CDE or GHI

https://github.com/mhkey/supersequence

Superfy
  • 326
  • 2
  • 9
0

It would be simple enough to build and sort a DAG (directed acyclic graph) of the stops. Essentially, we are combining the stop sequences of every single trip into one overall stop sequence.

The only annoying part is you most process all trips ahead of time to make sure all stops are covered. So this may take some time depending on how many trips you have in your system.

We first need some code to sort a DAG. Keep in mind that the following JavaScript code has not been extensively tested.

/**
 * This function sorts a directed acyclic graph (DAG) using depth-first search (DFS).
 * 
 * @example
 * 
 * const edges = [
 *   ["a", "b"],
 *   ["a", "c"],
 *   ["a", "e"],
 *   ["b", "d"],
 *   ["c", "d"],
 *   ["d", "e"],
 * ];
 * 
 * const order = sort_dag_dfs(edges); // ["a", "c", "b", "d", "e"]
 */
export const sort_dag_dfs = (edges) => {
    const nodes = new Set();
    const edges_map = new Map();

    for (const [from, to] of edges) {
        nodes.add(from);
        nodes.add(to);

        if (!edges_map.has(from)) {
            edges_map.set(from, new Set());
        }

        edges_map.get(from).add(to);
    }

    const visited = new Set();
    const stack = [];

    const dfs = (node) => {
        if (visited.has(node)) {
            return;
        }

        visited.add(node);

        if (edges_map.has(node)) {
            for (const to of edges_map.get(node)) {
                dfs(to);
            }
        }

        stack.push(node);
    };

    for (const node of nodes) {
        dfs(node);
    }

    return stack.reverse();
};

Now we need to iterate all trips for a route. For each trip, we add an "edge" for each pair of consecutive stops, which is essentially a constraint that one stop must come after the other. These constraints are combined to obtain a final sequence.

Here is some pseudo-code:

const edges = new Set();

for (const trip of trips) {
    const stops = [];

    for (const stop_idx of trip.sorted_stop_indices) {
        stops.push(trip.stop_ids.get(stop_idx));
    }

    for (let i = 1; i < stops.length; i++) {
        edges.add(`${stops[i - 1]}---${stops[i]}`);
    }
}

const sorted_stop_ids = sort_dag_dfs(Array.from(edges).map((edge) => edge.split("---")));

Note that there can be multiple correct orderings, and it may be worth further enhancing the ordering based on GPS coordinates of stops. If two potential stops can come next (e.g. there is a branch) then it may be worth choosing the stop that is closer by distance to the previous stop.

David Callanan
  • 5,601
  • 7
  • 63
  • 105