How to find the connected components of a networkx graph? (not using the nx.connected_components() command)

Question

I have created an undirected graph using networkx, I need to find a list of all the connected components.

connected_components = {}
def dfs(node):
    global connected_components, G  
    if node not in connected_components:
        connected_components[node] = set()
        for next in G.adj[node]:
            dfs(next)
            connected_components[node] = connected_components[next]
        connected_components[node].add(node)

for node_ in G:
    dfs(node_)

connected_comp_as_tuples = map(tuple, connected_components.values())
unique_components = set(connected_comp_as_tuples)
CC=list(unique_components)

I've tried using this code but the result is not the same as the one given using the nx.connected_components() command. What am I doing wrong?

What you are doing wrong is called 're-inventing the wheel' https://en.wikipedia.org/wiki/Reinventing_the_wheel — ravenspoint, Aug 28 '23 at 16:29
How do I hammer in a nail, but not using a hammer, even though I have a hammer. I'm not going to explain why I won't use the hammer. — wim, Aug 28 '23 at 16:33

score 0 · Answer 1 · answered Aug 28 '23 at 16:42

You're resetting the connected_components[node] set every single time you run the loop. Instead of connected_components[node] = connected_components[next] it should be something along the lines of connected_components[node] = connected_components[node].union(connected_components[next]). BTW it's unclear why you want to write this yourself in the first place, but for future reference, there is pretty good DFS pseudocode on Wikipedia https://en.wikipedia.org/wiki/Depth-first_search.

score 0 · Accepted Answer · answered Aug 28 '23 at 22:08

connected_components[node] = connected_components[next] is executed also when next is actually the "parent" from which the DFS call came. This way you can lose out of the set of nodes that were already collected in a set that were really "descendants" of the current node.

For instance, if you have a graph with 4 nodes, like this:

and the edges were added in an order such that G.adj looks like this:

[
    [2],
    [2],
    [3, 1, 0],
    [2]
]

...then the DFS calls will first go via 0, to 2, to 3, then from 2 to 1. At that moment all is OK, and connected_components[2] has {1, 3}. But then the problem arises: from 2 we visit 0. But node 0 is still associated with an empty set, and so node 2 also gets associated with it, losing the connection with the component that has nodes 1 and 3.

You should just implement the standard algorithm, using a visited marker:

def find_components(G):
    visited = set()
    
    def dfs(node):
        if node not in visited:
            visited.add(node)
            yield node
            for nxt in G.adj[node]:
                yield from dfs(nxt)
    
    for node in G:
        if node not in visited:
            yield tuple(dfs(node))  # a single component

connected_components = list(find_components(G))
print(connected_components)

How to find the connected components of a networkx graph? (not using the nx.connected_components() command)

2 Answers2