0

Background

I'm performing an iterative traffic assignment (ITA) on a directed weighted graph with ~12k nodes and ~25k edges. At each stage of the four iterations in ITA, I have to find the shortest path between an origin and a set of destinations (i.e., all the origins). The pseudocode looks like this:

for iteration in iterations:
    for origin in origins:
        paths = find the shortest paths between origin and destinations
        for destination in destinations:
            for each edge between origin and destination:
                assign traffic to edge
            compute some quantities based on path properties

There are ~30 nodes that are origins/destinations. The code I'm using is currently in Python 2.7 and uses networkx 1.8.1 to find the shortest paths between an origin and all destinations -- specifically, the function networkx.single_source_dijkstra_path.

Question

One call of ITA takes ~6.2 seconds on my local machine; about 95% of the time ITA takes to run is just finding these shortest paths. Since graph-tool has been shown to be 100x faster than networkx at finding shortest paths per its own documentation, I tried implementing the same code using graph-tool functions. Of note: the documentation of graph-tool's performance is based on a different machine than the one I am using (a MacBook Pro).

I've profiled the performance of networkx (version 1.8 in Python 2.7) and graph-tool (version 2.35 in Python 3.6), considering two metrics: (a) the time to complete one call of ITA and (b) the average time to find a set of paths between an origin and destination using shortest path functions in each package.

  • networkx (a) 6.2 seconds (b) 0.036 seconds
    • using paths_dict = networkx.single_source_dijkstra_path(G, origin, cutoff=None, weight='t_a')
  • graph-tool (a) 6.8 seconds (b) 0.050 seconds
    • using paths_dict = {destination:topology.shortest_path(G, origin, destination, weights=G.edge_properties.ta) for destination in od_dict[origin]} where topology is graphtool.

Why is graph-tool slower than networkx in my code? Is there a faster way to implement a single-origin-to-multiple-destinations shortest path search in graph-tool?

Full code

Here's the relevant portion of the ITA algorithm using graph-tool.

def test_traffic_assignment_graph_tool():
    iteration_vals = [0.4, 0.3, 0.2,
                      0.1]  # assign od vals in this amount per iteration. These are recommended values from the Nature paper, http://www.nature.com/srep/2012/121220/srep01001/pdf/srep01001.pdf

    G = gt.load_graph("input/graphMTC_GB.gml")

    original_node_ids = [G.vertex_properties.label[temp] for temp in G.vertices()] # these are the original node IDs (match the networkx graph)
    new_node_ids = [G.vertex_index[v] for v in G.vertices()] # these are the new node IDs assigned by graphtool

    # Create a mapping from original to new node ids -- since G.get_vertices() always returns the same order, it's ok.
    original_to_new = dict(zip(original_node_ids, new_node_ids))
    new_to_original = dict(zip(new_node_ids, original_node_ids))

    demand = bd.build_demand('input/BATS2000_34SuperD_TripTableData.csv',
                             'input/superdistricts_centroids_dummies.csv')

    overall_start = time.time()
    paths_time = []

    # sort OD pairs to fix inconsistency across different runs of the traffic assignment
    origins = [int(i) for i in demand.keys()]  # get SD node IDs as integers
    origins.sort()  # sort them
    origins = [str(i) for i in origins]  # make them strings again

    od_dict = bd.build_od(
        demand)

    for i in range(len(iteration_vals)):  # do 4 iterations

        for origin in origins:
            paths_start = time.time()
            paths_dict = {destination:topology.shortest_path(G, original_to_new[origin], original_to_new[destination], weights=G.edge_properties.ta) for destination in od_dict[origin]}
            paths_time.append(time.time() - paths_start)

    overall_end = time.time()

    print('Graphtool total pathfinding time = ', sum(paths_time))
    print('Graphtool average pathfinding time = ', sum(paths_time) / len(paths_time))
Gitanjali
  • 127
  • 3
  • 9
  • So officially questions asking for a recommendation are "off-topic" here, as are multi-question questions.. So you're unlikely to get an answer (and it would be hard to answer). A similar question asking "why is graph-tool slower than networkx" is likely to get a better response, and might well address your second question. – Joel Sep 26 '20 at 02:41
  • Hi Joel, thanks for pointing that out. I've edited the post to address your comment. – Gitanjali Sep 26 '20 at 03:24

1 Answers1

0

The networkx command paths_dict = networkx.single_source_dijkstra_path(G, origin, cutoff=None, weight='t_a') computes all of the paths from origin at once. Calculating a shortest path requires a lot of effort, but if you're calculating multiple paths, much of the effort is repeated. By calculating all paths at once, a lot of repetition is avoided.

The way you've set up the graph-tool call, for a given origin, you're redoing the calculation for each possible target node. So it's building up all of the shortest paths until it finally finds the target. Then it starts the process over. So it is doing much more calculation than the networkx code.

I don't know graph-tool well enough to know if it has an option to return shortest paths to all possible targets at once.

Joel
  • 22,598
  • 6
  • 69
  • 93