Computing graph from the distances between points

Question

I have a set of geographical points (lat, lon) and I want to compute a directed graph where:

the nodes are those points
an edge X->Y (between nodes X and Y) exists if the fastest path between X and Y doesn't pass through another node Z.

I am able to compute the duration of the path between any pair of nodes. Right now, I'm doing the following:

compute the durations between every pair of nodes
for every pair of nodes X,Y, there is an edge between X and Y if there is no node Z such that the duration of X->Z plus the duration of Z->Y is the same as the duration of X->Y.

I have tested this approach for a subset of the nodes and it seems to work, but since I have around 2000 nodes and the computation of the duration between nodes is computationally expensive (because it involves calculating the shortest path), I would like to know if there is a better approach.

Some additional (probably not relevant) info:

The nodes are bus stop locations, taken from a GTFS feed
I'm calculating the shortest paths durations using http://project-osrm.org/

Any help will be greatly appreciated

Look up `Minimum Spanning Tree (MST)`, I believe that is exactly what you seek. There are much faster ways to build them. Also, look into the API documentation for "distance matrix" (they refer to it as "table service") that will return a 200x200 matrix telling you the distances between everything. Then you just need to build the MST from that. http://project-osrm.org/docs/v5.23.0/api/#table-service — Nuclearman, Oct 23 '20 at 07:51
Though likely the table service can be optimized as you really don't need to know all 200x200 distances. Get the tables service for the nearest X locations to that would likely work fine, assuming at least one of those X is within your 200 locations. Otherwise, table service is probably the best route. — Nuclearman, Oct 23 '20 at 07:58
@Nuclearman Thank you for your answer. However, MST is for undirected graphs, and my graph has to be directed (because it corresponds to a road network). I came across Chu–Liu/Edmonds' algorithm, which is the equivalent of MST for directed graphs,, but that produces a graph in which every node has only one incoming edge, which doesn't hold in my case. — megalodon, Oct 24 '20 at 15:48
Hmmm, added an answer, but worst case, not sure you can do much better than all paths if your shortest paths are very tangled. The approach basically tries to start with a triangulation to estimate what should be the shortest paths, then tries to verify that they actually are the shortest paths. — Nuclearman, Oct 25 '20 at 00:35

score 0 · Accepted Answer · answered Oct 25 '20 at 00:30

I would start by running a Delaunay triangulation, this will give you an idea of what the path structure "should" look like (in theory). In theory, once you have the delaunary triangulation, it is easy to find the shortest paths. In practice however, you are using road networks instead of point-to-point distances, so you have to challenge those assumptions.

First build a way that given the Delaunay triangulation. Example of what the triangulation might look like below. Let's say that points HJG are within a city center and the points around it are a freeway that surrounds it. Say with a 65 mph or so speed limit. This means going from ABCG might be faster than AHG.
For every edge in the Delaunay triangulation. Determine the duration of the edge in both directions. This will give you your initial weights (it may also effect the triangulation if you can determine that the shortest path for AH is actually ABH). If there isn't a way to go around for faster routes. The ABCG being faster than AHG example. Then you are probably done at this point. If not you'll have to try to contract edges, which depending on your data, might require doing all routes anyway as you can't really be sure some obscure path allows it to be faster. Sadly, triangulation can't account for differences in speed between edges.

If you can set it up to tell you which point it does go through, then you can avoid having to run it on all paths. IE: If you can say have it tell you the fastest path AF, and it says it uses AHGF, then you can be reasonably sure that AHG and HGF are the shortest paths between AG and HF, and you don't need to run those. However, if you find that AF actually takes ABCEF instead, then there is clearly a great speed difference between the routes, and you should also check the fastest route to AG to ensure it's actually AHG. Whenever you find a faster edge, be sure to add to the triangulation. You may actually be able to add multiple edges. Basically the idea here is that the triangulation gives you an idea of what should be fastest routes if speed doesn't matter. However, you need to confirm or reject that hypothesis. Basically for each node. Find the shortest paths to all points via BFS. Take the longest path and explore it.
To aid in #3, it would likely help to vertex path covering of the edges in the graph. Create paths to all the furthest points from each point until all points are covered. IE: ABCD ABCE AHGF AHJKL AIK (also the reverse order as it is directional). Then you can run shortest path from AD, AE, AF, AL, and AK. If those paths match the expected paths, then you are done. If not, you'll be refining things along the way in the method used for #3 above. Worst case though, I'm not sure there's a way to get this much faster than all shortest paths in theory this approach is roughly O(VE).
In theory #4 requires starting at every vertex. However, vertices can be skipped if the connecting paths have been valid as being the shortest path, so I would likely recommend starting checking vertices with the least number of (un-validated?) edges. Though again, you may need to do all pairs to be 100% sure only the shortest edges connect each node.
For testing I would recommend doing this: Randomly select say 10-200 of those 2000 points. Run the algorithm on those 10-200 points. Then run all paths on those points. If they don't agree take a close look at why they don't agree. Feel free to add any "exceptional" cases to your answer and comment here. Repeat many times and see if there is any disagreement. Note all disagreements. If there are no disagreements, try running on the entire 2000 point set. I would strongly recommend doing this test after step 2. There is a chance that the Delaunay graph is very close to the optimal, depending on your dataset.

In the worst case, it might be useful to create a 2000X2000 table that is used to verify shortest paths. And mark edges as you have verified they are the shortest edges. In theory, checking that those are the shortest paths is roughly V^2. Which is perhaps better than all shortest paths if you have to do BFS to find the shortest path. — Nuclearman, Oct 25 '20 at 00:38
Thank you very much for your detailed explanation. I will give it a try and report my findings here. Cheers — megalodon, Oct 27 '20 at 21:14

Computing graph from the distances between points

1 Answers1