After regular DBSCAN I got a map with the clusters
Im Attaching nearest nodes to each firm plotted by OSMNX , then create the network-based distance matrix in order to reproduce Network-Based Spatial Clustering from this TUTORIAL
Speed up distance matrix computation: rather than calculating every firm to every firm, find every node with at least 1 firm attached, then calculate every such node to every such node distance. Once we have the node-to-node distances, reindex it to make use those distances firm-to-firm.
this is the code:
# attach nearest network node to each firm --APPLY SOLUTION B HERE
firms['nn'] = ox.get_nearest_nodes(G, X=firms['x'], Y=firms['y'], method='balltree')
print(len(firms['nn']))
# we'll get distances for each pair of nodes that have firms attached to them
nodes_unique = pd.Series(firms['nn'].unique())
nodes_unique.index = nodes_unique.values
print(len(nodes_unique))
# convert MultiDiGraph to DiGraph for simpler faster distance matrix computation
G_dm = nx.DiGraph(G)
OUTPUT:
269
230
time: 2.74 s
THEN
# calculate network-based distance between each node --APPLY SOLUTION A HERE
def network_distance_matrix(u, G, vs=nodes_unique):
dists = [nx.dijkstra_path_length(G, source=u, target=v, weight='length') for v in vs]
return pd.Series(dists, index=vs)
AND FINALLY
%%time
from tqdm._tqdm_notebook import tqdm_notebook
tqdm_notebook.pandas()
# create node-based distance matrix called node_dm
node_dm = nodes_unique.progress_apply(network_distance_matrix, G=G_dm)
node_dm = node_dm.astype(int)
print(node_dm.size)
Solution A and B special thanks to gboeing:
# OPTION A: recursively remove unsolvable origin/destination nodes and re-try
def network_distance_matrix(u, G, vs=nodes_unique):
G2 = G.copy()
solved = False
while not solved:
try:
dists = [nx.dijkstra_path_length(G, source=u, target=v, weight='length') for v in vs]
return pd.Series(dists, index=vs)
solved = True
except nx.exception.NetworkXNoPath:
G2.remove_nodes_from([dist])
# OPTION B: Use a strongly (instead of weakly) connected graph
Gs = ox.utils_graph.get_largest_component(G, strongly=True)
# attach nearest network node to each firm
firms['nn'] = ox.get_nearest_nodes(Gs, X=firms['x'], Y=firms['y'], method='balltree')
print(len(firms['nn']))
# we'll get distances for each pair of nodes that have firms attached to them
nodes_unique = pd.Series(firms['nn'].unique())
nodes_unique.index = nodes_unique.values
print(len(nodes_unique))
# convert MultiDiGraph to DiGraph for simpler faster distance matrix computation
G_dm = nx.DiGraph(Gs)