0

I have a directed graph with non-negative weighted edges where there are multiple edges between two vertices.

I need to compute all pairs shortest path. This graph is very big (20 milion vertices and 100 milion of edges). Is Floyd–Warshall the best algorithm ? There is a good library or tool to complete this task ?

ross94
  • 3
  • 2

1 Answers1

0

There exists several all-to-all shortest paths algorithms for directed graphs with non-negative cycles, Floyd-Warshall being probably the most famous, but with the figures you gave, I think you will have in any case memory issues (time could be an issue, but you can find all-to-all algorithm that can be easily and massively parallelized).
Independently of the algorithm you use, you will have to store the result somewhere. And storing 20,000,000² = 400,000,000,000,000 paths length (if not the full paths themselves) would use hundreds of terabytes, at the very least.
Accessing any of these results would probably be longer than calculating one shortest path (memory wall), which can be done in less than a milisecond (depending on the graph structure, you can find techniques that are much, much faster than Dijkstra or any priority queue based algorithm).

I think you should look for an alternative where computing all-to-all shortest paths is not required, to be honnest. Or, to study the structure of your graph (DAG, well structured graph easy to partition/cluster, geometric/geographic information ...) in order to apply different algorithms, because in the general case, I do not see any way around.

For example, with the figures you gave, an average degree of about 5 makes for a decently sparse graph, considering its dimensions. Graph partitioning approaches could then be very useful.

m.raynal
  • 2,983
  • 2
  • 21
  • 34
  • thanks you for response, your points of view is very interesting i did not consider memory before, do you also know some library or tool for manage graph like tihs ? – ross94 Mar 20 '19 at 15:37
  • SAGE and Boost can provide good tools to deal with massive graphs for example, but they're already high end tools with a slow learning curve. Some python libraries can also deal """efficiently""" with big objects (scipy, numpy) ... If you want to find "neighbors", KD-trees is a nice tool. Clustering can also be useful. I'd suggest you have a look at section 2 of [this paper](https://arxiv.org/pdf/1504.05140.pdf), several nice speed-up techniques for shortest paths are explained. – m.raynal Mar 20 '19 at 15:55
  • Thank you for your help! i will try this tools and read the paper. – ross94 Mar 21 '19 at 16:25