15

I implemented the Tarjan's strongly connected components algorithm, according to wikipedia, in Python, but it isn't working. The algorithm is quite short and I cannot find any difference, so I cannot tell why it isn't working. I tried to check the original paper, but could not find it.

Here is the code.

def strongConnect(v):
  global E, idx, CCs, c, S
  idx[v] = (c, c) #idx[v][0] for v.index # idx[v][1] for v.lowlink
  c += 1
  S.append(v)  
  for w in [u for (v2, u) in E if v == v2]:
    if idx[w][0] < 0:
      strongConnect(w)
      # idx[w] = (idx[w][0], min(idx[v][1], idx[w][1])) #fixed, thx
      idx[v] = (idx[v][0], min(idx[v][1], idx[w][1]))
    elif w in S:
      idx[v] = (idx[v][0], min(idx[v][1], idx[w][0]))
  if (idx[v][0] == idx[v][1]):
    i = S.index(v)
    CCs.append(S[i:])
    S = S[:i]

E = [('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'E'), ('E', 'A'), ('A', 'E'), ('C', 'A'), ('C', 'E'), ('D', 'F'), ('F', 'B'), ('E', 'F')]
idx = {}
CCs = []
c = 0
S = []
for (u, v) in E:
  idx[u] = (-1, -1)
  idx[v] = (-1, -1)
for v in idx.keys():
  if idx[v][0] < 0:
    strongConnect(v)

print(CCs)

You can check the graph visually if you prefer. As you can see this is a quite forward translation of the pseudocode in wikipedia. However, this is the output:

[['D', 'E', 'F'], ['B', 'C'], ['A']]

There should be only one strongly connected component, not three. I hope the question is right in all its aspects, if not I'm sorry. In any case, thank you very much.

jmora
  • 491
  • 3
  • 14
  • 2
    Algorithmic descriptions and Python are a collision of unreadable and readable which makes me not want to parse this. Here is Tarjan[1972] implemented more readably: http://www.bitformation.com/art/python_toposort.html – msw Jul 04 '11 at 19:01
  • And another: http://www.logarithmic.net/pfh-files/blog/01208083168/sort.py – msw Jul 04 '11 at 19:03
  • that works, msw, thank you, I'll check that, my implementation and wikipedia and try to fix it. Thank you. – jmora Jul 04 '11 at 19:42
  • 2
    You might also want to check out the implementation (a slight variation of Tarjan) in `networkx` which is a really nice python graph analysis package: http://networkx.lanl.gov/reference/generated/networkx.algorithms.components.strongly_connected.strongly_connected_components.html#networkx.algorithms.components.strongly_connected.strongly_connected_components – JoshAdel Jul 04 '11 at 19:47
  • This is just a prototype, I have to modify this algorithm to suit my needs in Java, along with [an algorithm to find strong articulation points](http://www.springerlink.com/content/gt07037641261750/) and one to [calculate dominator tree](http://stackoverflow.com/questions/249720/efficient-way-to-recursively-calculate-dominator-tree) as soon as I find it. But the library is cool, so I'll keep it for future reference, thank you. – jmora Jul 05 '11 at 11:32
  • The image in your original post is out of date: it no longer renders anything resembling an image – mj_codec Oct 06 '22 at 05:38

2 Answers2

16

Ok, I had some more time to think about this. I'm no longer certain that filtering the edges was the problem, as I previously stated. In fact, I think there's an ambiguity in the pseudocode; does for each (v, w) in E mean for each edge (as the literal meaning of for each suggests), or only each edge beginning with v, (as you reasonably assumed)? Then, after the for loop, is the v in question the final v from the for loop, as it would be in Python? Or does that go back to being the original v? Pseudocode doesn't have clearly defined scoping behavior in this case! (It would be really weird if the v at the end were to be the last, arbitrary, value of v from the loop. That suggests that filtering is correct, because in that case, v means the same thing all the way through.)

However, under any circumstances, the clear error in your code is here:

  idx[w] = (idx[w][0], min(idx[v][1], idx[w][1]))

According to the pseudocode, that should definitely be

  idx[v] = (idx[v][0], min(idx[v][1], idx[w][1]))

Once you make that change, you get the expected result. Frankly it doesn't surprise me that you made that mistake, because you're using a really weird and counterintuitive data structure. Here's what I think is an improvement -- it adds only a few more lines, and I find it to be much more readable.

import itertools

def strong_connect(vertex):
    global edges, indices, lowlinks, connected_components, index, stack
    indices[vertex] = index
    lowlinks[vertex] = index
    index += 1
    stack.append(vertex)

    for v, w in (e for e in edges if e[0] == vertex):
        if indices[w] < 0:
            strong_connect(w)
            lowlinks[v] = min(lowlinks[v], lowlinks[w])
        elif w in stack:
            lowlinks[v] = min(lowlinks[v], indices[w])

    if indices[vertex] == lowlinks[vertex]:
        connected_components.append([])
        while stack[-1] != vertex:
            connected_components[-1].append(stack.pop())
        connected_components[-1].append(stack.pop())

edges = [('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'E'), 
         ('E', 'A'), ('A', 'E'), ('C', 'A'), ('C', 'E'), 
         ('D', 'F'), ('F', 'B'), ('E', 'F')]
vertices = set(v for v in itertools.chain(*edges))
indices = dict((v, -1) for v in vertices)
lowlinks = indices.copy()
connected_components = []

index = 0
stack = []
for v in vertices:
    if indices[v] < 0:
        strong_connect(v)

print(connected_components)

However, I find the use of global variables here distasteful. You could hide this away in its own module, but I prefer the idea of creating a callable class. After looking more closely at Tarjan's original pseudocode, (which confirms that the "filtered" version is correct, by the way), I wrote this. It includes a simple Graph class and does couple of basic tests:

from itertools import chain
from collections import defaultdict

class Graph(object):
    def __init__(self, edges, vertices=()):
        edges = list(list(x) for x in edges)
        self.edges = edges
        self.vertices = set(chain(*edges)).union(vertices)
        self.tails = defaultdict(list)
        for head, tail in self.edges:
            self.tails[head].append(tail)

    @classmethod
    def from_dict(cls, edge_dict):
        return cls((k, v) for k, vs in edge_dict.iteritems() for v in vs)

class _StrongCC(object):
    def strong_connect(self, head):
        lowlink, count, stack = self.lowlink, self.count, self.stack
        lowlink[head] = count[head] = self.counter = self.counter + 1
        stack.append(head)

        for tail in self.graph.tails[head]:
            if tail not in count:
                self.strong_connect(tail)
                lowlink[head] = min(lowlink[head], lowlink[tail])
            elif count[tail] < count[head]:
                if tail in self.stack:
                    lowlink[head] = min(lowlink[head], count[tail])

        if lowlink[head] == count[head]:
            component = []
            while stack and count[stack[-1]] >= count[head]:
                component.append(stack.pop())
            self.connected_components.append(component)

    def __call__(self, graph):
        self.graph = graph
        self.counter = 0
        self.count = dict()
        self.lowlink = dict()
        self.stack = []
        self.connected_components = []

        for v in self.graph.vertices:
            if v not in self.count:
                self.strong_connect(v)

        return self.connected_components

strongly_connected_components = _StrongCC()

if __name__ == '__main__':
    edges = [('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'E'),
             ('E', 'A'), ('A', 'E'), ('C', 'A'), ('C', 'E'),
             ('D', 'F'), ('F', 'B'), ('E', 'F')]
    print strongly_connected_components(Graph(edges))
    edge_dict = {'a':['b', 'c', 'd'],
                 'b':['c', 'a'],
                 'c':['d', 'e'],
                 'd':['e'],
                 'e':['c']}
    print strongly_connected_components(Graph.from_dict(edge_dict))
senderle
  • 145,869
  • 36
  • 209
  • 233
  • Ooops. Thank you, that's exactly the problem. I tried with some more test cases. Also the algorithm is ambiguous and confusing (at least to me) that's why I wanted to have this implementation to check I understood it. Thank you again. – jmora Jul 05 '11 at 11:35
  • Can you explain why we need 'elif w in stack' instead of just updating 'lowlinks[v] = min(lowlinks[v], indices[w])' if w has already been visited? – Minh Pham Mar 29 '13 at 08:08
  • @MinhPham, it's been a while since I carefully thought about this algorithm, but I know that `elif w in stack` is not the same condition as "if w has already been visited". Sometimes, `w` has been visited but is not in the stack, because it has been removed from the stack and placed in the list of connected components. I _think_ this corresponds to situations in which the `v-w` edge connects two components in one direction (from v in one component to w in the other), but there is no connection in the other direction, so the two components are not strongly connected to each other. – senderle Mar 29 '13 at 12:11
  • 1
    @JonasGröger, glad you created that! Do note that user contributions to Stack Overflow are under a [creative commons license with attribution required](http://blog.stackoverflow.com/2009/06/attribution-required/). It would be great if you could add attribution to that gist. Thanks! – senderle Apr 25 '13 at 12:06
0

I modified senderle's answer for Python 3.6+ and added type hints and comments so that it made more sense to me.

from itertools import chain
from collections import defaultdict
from typing import Iterable, DefaultDict, List, Dict, Generic, TypeVar, Tuple, Set

T = TypeVar('T')  # label for a vertex

class Graph(Generic[T]):
    def __init__(self, edges: Iterable[Tuple[T, T]], vertices: Iterable[T] = ()):
        edges = [list(x) for x in edges]
        self.edges = edges
        self.vertices: Set[T] = set(chain(*edges)).union(vertices)
        self.adj_list: DefaultDict[T, List[T]] = defaultdict(list)  # i.e., neighbors of a given node
        for head, tail in self.edges:
            self.adj_list[head].append(tail)

    @classmethod
    def from_dict(cls, edge_dict: Dict[T, Iterable[T]]):
        return cls((k, v) for k, vs in edge_dict.items() for v in vs)

def strongly_connected_components(graph: Graph[T]) -> List[List[T]]:
    idx = 0  # index to be assigned to the next node
    node_idxs: Dict[T, int] = {}  # index of a visited node
    lowlink: Dict[T, int] = {}  # low-link number is the lowest node number (index) reachable by the node that is in the same connected component – its own number, or the low-link number of a previous unvisited neighbor, or the node number of a visited neighbor in the stack
    stack: List[T] = []
    connected_components: List[List[T]] = []

    def visit(head: T) -> None:
        nonlocal idx
        lowlink[head] = node_idxs[head] = idx
        idx += 1
        stack.append(head)

        for neighbor in graph.adj_list[head]:
            if neighbor not in node_idxs:  # i.e., not visited
                visit(neighbor)
                lowlink[head] = min(lowlink[head], lowlink[neighbor])
            elif node_idxs[neighbor] < node_idxs[head]:
                if neighbor in stack:
                    lowlink[head] = min(lowlink[head], node_idxs[neighbor])

        if lowlink[head] == node_idxs[head]:
            component: List[T] = []
            while stack and node_idxs[stack[-1]] >= node_idxs[head]:
                component.append(stack.pop())
            connected_components.append(component)

    for v in graph.vertices:
        if v not in node_idxs:
            visit(v)
    return connected_components

if __name__ == '__main__':
    edges = [('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'E'),
                ('E', 'A'), ('A', 'E'), ('C', 'A'), ('C', 'E'),
                ('D', 'F'), ('F', 'B'), ('E', 'F')]
    print(strongly_connected_components(Graph(edges)))  # [['F', 'D', 'C', 'B', 'A', 'E']]
    edge_dict = {'a':['b', 'c', 'd'],
                    'b':['c', 'a'],
                    'c':['d', 'e'],
                    'd':['e'],
                    'e':['c']}
    print(strongly_connected_components(Graph.from_dict(edge_dict)))  # [['e', 'd', 'c'], ['b', 'a']]
mic
  • 1,190
  • 1
  • 17
  • 29