4

I am trying to find Strongly Connected Components in a large graph implementing Kosaraju's algorithm. It requires running a DFS on a graph in reverse, and then forward. If you're interested the list of edges for this graph are here: https://dl.dropboxusercontent.com/u/28024296/SCC.txt.tar.gz

I can't implement this recursively in Python, it exceeds its recursive limits and crashes if I increase them. I'm trying to implement through iteration.

Below is my code for 1. Loading the graph in reverse into a Dictionary, and 2. Running the DFS on it iteratively for each node from n -> 1.

This code runs perfect for small sample graphs but just doesn't run for this large graph. I get it's inefficient but any tips on how to make it work?

def reverseFileLoader():

    graph = collections.defaultdict(lambda: {'g': [], 's': False, 't': None, 'u': None })
    for line in open('/home/edd91/Documents/SCC.txt'):
        k, v = map(int, line.split())
        graph[v]['g'].append(k)

    return graph

def DFS(graph, i):
    global t
    global source
    stack = []
    seen = []
    seen.append(i)
    stack.append(i)

    while stack:
        s = stack[-1]
        j = len(graph[s]['g']) - 1
        h = 0
        while (j >= 0):
            if graph[s]['g'][j] not in seen and graph[graph[s]['g'][j]]['t'] == None:
                seen.append(graph[s]['g'][j])
                stack.append(graph[s]['g'][j])
                h += 1
            j -= 1

        if h == 0:
            if graph[s]['t'] == None:
                t += 1
                graph[s]['u'] = source
                graph[s]['t'] = t 
            stack.pop()

def DFSLoop(graph):
    global t
    t = 0
    global source
    source = None
    i = len(graph)
    while (i >= 1):
        print "running for " + str(i)
        source = i
        DFS(graph, i)
        i -= 1
OmG
  • 18,337
  • 10
  • 57
  • 90
edd91
  • 185
  • 8

1 Answers1

2

Kosaraju's algorithm probably requires that checking whether an element has been seen is an O(1) operation. But your seen data structure has O(n) time membership testing. Converting seen from a list to a set makes the code execute in a few seconds on my system (after also removing the prints which took up most of the remaining execution time).

For completeness the changes you need to make are

  • Change seen = [] to seen = set()
  • Change each seen.append(...) to seen.add(...).
Trevor Merrifield
  • 4,541
  • 2
  • 21
  • 24
  • 1
    Thanks @trevor-merrifield ! Isn't it amazing such small changes make the code run, that's so helpful. I assume a set is hashed somehow? – edd91 Sep 02 '16 at 13:14
  • Yep the source code [setobject.h](https://github.com/python/cpython/blob/master/Include/setobject.h) [setobject.c](https://github.com/python/cpython/blob/master/Objects/setobject.c) shows it is like a hash table without values. Hence the O(1)-ish lookup time. – Trevor Merrifield Sep 02 '16 at 19:23