0

After regular DBSCAN I got a map with the clusters enter image description here

Im Attaching nearest nodes to each firm plotted by OSMNX , then create the network-based distance matrix in order to reproduce Network-Based Spatial Clustering from this TUTORIAL

Speed up distance matrix computation: rather than calculating every firm to every firm, find every node with at least 1 firm attached, then calculate every such node to every such node distance. Once we have the node-to-node distances, reindex it to make use those distances firm-to-firm.

this is the code:

# attach nearest network node to each firm --APPLY SOLUTION B HERE
firms['nn'] = ox.get_nearest_nodes(G, X=firms['x'], Y=firms['y'], method='balltree')
print(len(firms['nn']))

# we'll get distances for each pair of nodes that have firms attached to them
nodes_unique = pd.Series(firms['nn'].unique())
nodes_unique.index = nodes_unique.values
print(len(nodes_unique))

# convert MultiDiGraph to DiGraph for simpler faster distance matrix computation
G_dm = nx.DiGraph(G)

OUTPUT:

269
230
time: 2.74 s

THEN

# calculate network-based distance between each node --APPLY SOLUTION A HERE

def network_distance_matrix(u, G, vs=nodes_unique):
    
    dists = [nx.dijkstra_path_length(G, source=u, target=v, weight='length') for v in vs]
    return pd.Series(dists, index=vs)

AND FINALLY

%%time
from tqdm._tqdm_notebook import tqdm_notebook
tqdm_notebook.pandas()
# create node-based distance matrix called node_dm
node_dm = nodes_unique.progress_apply(network_distance_matrix, G=G_dm)
node_dm = node_dm.astype(int)
print(node_dm.size)

Solution A and B special thanks to gboeing:

# OPTION A: recursively remove unsolvable origin/destination nodes and re-try
def network_distance_matrix(u, G, vs=nodes_unique):
G2 = G.copy()
solved = False
while not solved:
    try:
    dists = [nx.dijkstra_path_length(G, source=u, target=v, weight='length') for v in vs]
    return pd.Series(dists, index=vs)
        solved = True
    except nx.exception.NetworkXNoPath:
        G2.remove_nodes_from([dist])

# OPTION B: Use a strongly (instead of weakly) connected graph 
Gs = ox.utils_graph.get_largest_component(G, strongly=True) 

# attach nearest network node to each firm
firms['nn'] = ox.get_nearest_nodes(Gs, X=firms['x'], Y=firms['y'], method='balltree') 
print(len(firms['nn'])) 

# we'll get distances for each pair of nodes that have firms attached to them
nodes_unique = pd.Series(firms['nn'].unique()) 
nodes_unique.index = nodes_unique.values 
print(len(nodes_unique)) 

# convert MultiDiGraph to DiGraph for simpler faster distance matrix computation 
G_dm = nx.DiGraph(Gs)
  • Is your graph strongly connected? If it's only weakly connected, then you cannot assume that all nodes are reachable from all other nodes, due to artificial periphery effects. – gboeing Aug 31 '20 at 23:40
  • @gboeing , your code is awesome, after read your paper, im still confused about artificial periphery effects, so as many nodes are unreachable (these firms are coordinates of health centers in Queretaro city, Mexico) I should set the parameters to: ```nodes_unique = pd.Series(firms['nn'].unique()) nodes_unique.index = nodes_unique.values nodes_unique = list(nx.weakly_connected_components(G)). print(len(nodes_unique))``` – Javier Alejandro Rendon Carril Sep 02 '20 at 21:24
  • See also https://stackoverflow.com/a/63713539/7321942 – gboeing Sep 02 '20 at 21:29
  • after doing this # Use a strongly (instead of weakly) connected graph Gs = ox.utils_graph.get_largest_component(G, strongly=True) # attach nearest network node to each firm firms['nn'] = ox.get_nearest_nodes(Gs, X=firms['x'], Y=firms['y'], method='balltree') print(len(firms['nn'])) # we'll get distances for each pair of nodes that have firms attached to them nodes_unique = pd.Series(firms['nn'].unique()) nodes_unique.index = nodes_unique.values print(len(nodes_unique)) # convert MultiDiGraph to DiGraph for simpler faster distance matrix computation G_dm = nx.DiGraph(Gs) – Javier Alejandro Rendon Carril Sep 03 '20 at 22:17
  • I was able to continue with the process, but a the end only one cluster was generated something were wrong with graph strongly connected? – Javier Alejandro Rendon Carril Sep 03 '20 at 22:19

0 Answers0