0

Let's say I have an unweighted directed graph. I was wondering if there was a way to store all the distances between a starting node and all the remaining nodes of the graph. I know Dijkstra's algorithm could be an option, but I'm not sure this would be the best one, since I'm working with a pretty big graph (~100k nodes), and it is an unweighted one. My toughts so far were to perform a BFS, trying to store all the distances meanwhile. Is this a feasible approach?

Finally, since I'm pretty new on graph theory, could someone maybe point me in the right direction for a good Python implementation of this kind of problem?

martineau
  • 119,623
  • 25
  • 170
  • 301
mikcnt
  • 55
  • 5
  • This is what you are looking for https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm : ) – Minarth Jan 10 '21 at 21:36
  • @Minarth I'm not sure if this is ok for what I'm trying to achieve here. I'm searching for the distances between just one node and the others. This seems to be a little bit overkill. Am I wrong? I'm asking because having that many nodes it just doesn't seem possible (at least trying the Scipy implementation). – mikcnt Jan 10 '21 at 22:51

1 Answers1

0

Definitely feasible, and pretty fast if your data structure contains a list of end nodes for each starting node indexed on the starting node identifier:

Here's an example using a dictionary for edges: {startNode:list of end nodes}

from collections import deque
maxDistance = 0
def getDistances(origin,edges):
    global maxDistance
    maxDistance  = 0
    distances = {origin:0}         # {endNode:distance from origin}
    toLink    = deque([origin])    # start at origin (distance=0)
    while toLink:
        start = toLink.popleft()     # previous end, will chain to next
        dist  = distances[start] + 1 # new next are at +1
        for end in edges[start]:                # next end nodes 
            if end in distances: continue       # new ones only
            distances[end] = dist               # record distance
            toLink.append(end)                  # will link from there
            maxDistance = max(maxDistance,dist)      
            
    return distances

This does one iteration per node (excluding unreachable nodes) and uses fast dictionary access to follow links to new next nodes

Using some random test data (10 million edges) ...

import random
from collections import defaultdict

print("loading simulated graphs")
vertexCount = 100000
edgeCount   = vertexCount * 100
edges       = defaultdict(set)
edgesLoaded = 0
minSpan     = 1 # vertexCount//2
while edgesLoaded<edgeCount:
    start = random.randrange(vertexCount)
    end   = random.randrange(vertexCount)
    if abs(start-end) > minSpan and end not in edges[start]:
        edges[start].add(end)
        edgesLoaded += 1
print("loaded!")

Performance:

# starting from a randomly selected node
origin    = random.choice(list(edges.keys())) 

from timeit import timeit
t = timeit(lambda:getDistances(origin,edges),number=1)

print(f"{t:.2f} seconds for",edgeCount,"edges", "max distance = ",maxDistance)

# 3.06 seconds for 10000000 edges max distance =  4        
Alain T.
  • 40,517
  • 4
  • 31
  • 51