Accelerate a loop in Python

Question

I am running a loop that computes a networkx.classes.multidigraph.MultiDiGraph for each row (neighbourhood) of a list of GeoDataFrames (cities). It then computes some statistics for each row and writes the file out to disk. The problem is that the loop is extremely long to compute because the graph is computed for each row. I'm looking for a way to accelerate the loop

I tried to compute the graph for each complete GeoDataFrame before clipping it to neighbourhood boundaries, but I don't know of a method to clip Networkx graphs.

Here is my initial code, which takes 95 seconds:

import osmnx as ox
import pandas as pd
import geopandas as gpd
import os

path="C:/folder/"
files=[os.path.join(path, f) for f in os.listdir(path)]
merged=[]

for i in range(0,2):
    city=gpd.read_file(files[i])
    circ=[]

    for j in range(0,2):
        graph_for_row=ox.graph_from_polygon(city.geometry[j])
        #above is the long command
        stat = ox.basic_stats(graph_for_row)
        circ.append(stat['circuity_avg'])

    circ=pd.Series(circ)
    merged.append(pd.concat([city, circ], axis=1))

for k in (range(0,len(merged))):
    with open(geofiles[k], 'w') as f:
        f.write(merged[k].to_json())

How could I speed up my loop?

@LarryLustig I tried to boil it down to the essentials for this question and may have made a typo or inadvertently omitted a part. If I did, please point it out. — Ben Mann, Feb 17 '20 at 20:29

score 0 · Accepted Answer · answered Feb 23 '20 at 21:22

The answer was indeed to compute the graph for each city before clipping it according to the neighbourhood polygons. This was suggested by @gboeing in his answer to this question.

city=gpd.read_file('C:/folder/file.geojson')
city_graph=ox.graph_from_polygon(city.unary_union, network_type='drive')
circ=[]

for j in (len(city)):
    intersecting_nodes = nodes[nodes.intersects(j)].index
    neighbourhood_graph = city_graph.subgraph(intersecting_nodes)
    stat = ox.basic_stats(neighbourhood_graph)
    circ.append(stat['circuity_avg'])

circ=pd.Series(circ)
merged.append(pd.concat([city, circ], axis=1))

Accelerate a loop in Python

1 Answers1