Here is an alternative using the suggested reduction to a graph problem. I hope the code is clear enough, I'll still add a few explanations.
Convert to a list of adjacency
Just because it's easier to work with:
from collections import defaultdict
edges = [
['a', 'b'],
['a', 'c'],
['b', 'c'],
['c', 'd'],
['e', 'f'],
['f', 'g'],
['x', 'y'],
]
def graph_from_edges(edge):
graph = defaultdict(set)
for u, v in edges:
graph[u].add(v)
graph[v].add(u)
return graph
graph = graph_from_edges(edges)
The graph
now contains:
{
'a': {'c', 'b'},
'b': {'c', 'a'},
'c': {'d', 'b', 'a'},
'd': {'c'},
'e': {'f'},
'f': {'e', 'g'},
'g': {'f'},
'x': {'y'},
'y': {'x'}
}
Find the connected component of a given node
This is a simpler sub-problem to solve, we give a node and explore the graph nearby until we only have visited node left available:
def connected_component_from(graph, starting_node):
nodes = set(starting_node)
visited = set()
while nodes:
node = nodes.pop()
yield node
visited.add(node)
nodes |= graph[node] - visited
print(list(connected_component_from(graph, 'a')))
This prints the list of nodes in the connected component of node 'a'
:
['a', 'b', 'c', 'd']
Finding all connected components
Now we just need to repeat the previous operation until we have visited all nodes in the graph. To discover new unexplored components we simply pick a random unvisited node to start over:
def connected_components(graph):
all_nodes = set(graph.keys())
visited = set()
while all_nodes - visited:
starting_node = random_node(all_nodes - visited)
connected_component = set(connected_component_from(graph, starting_node))
yield connected_component
visited |= connected_component
def random_node(nodes):
return random.sample(nodes, 1)
graph_cc = list(connected_components(graph))
print(graph_cc)
Which prints:
[{'a', 'c', 'd', 'b'}, {'g', 'e', 'f'}, {'y', 'x'}]
Shortcut
You could also use an existing library to compute these connected components for you, for example networkx:
import networkx as nx
G = nx.Graph()
G.add_edges_from(edges)
cc = list(nx.connected_components(G))
print(graph_cc)
Which also prints:
[{'a', 'c', 'd', 'b'}, {'g', 'e', 'f'}, {'y', 'x'}]
In practice that would be the best solution, but that's less interesting if the goal is to learn new things. Notice that you can view networkx implementation of the function (which uses this BFS)
Going back to the original problem
We managed to find nodes from the same connected component, but that's not what you wanted, so we need to get original lists back.
To do this a bit faster on large graphs, one possibility is to first have a map from node names to their connected component index in the previous list:
node_cc_index = {u: i for i, cc in enumerate(graph_cc) for u in cc}
print(node_cc_index)
Which gives:
{'g': 0, 'e': 0, 'f': 0, 'a': 1, 'c': 1, 'd': 1, 'b': 1, 'y': 2, 'x': 2}
We can use that to fill the list of edges split as you first requested:
edges_groups = [[] for _ in graph_cc]
for u, v in edges:
edges_groups[node_cc_index[u]].append([u, v])
print(edges_groups)
Which finally gives:
[
[['e', 'f'], ['f', 'g']],
[['a', 'b'], ['a', 'c'], ['b', 'c'], ['c', 'd']],
[['x', 'y']]
]
Each sublist conserves the original order, but the order between lists is not preserved in any way (its a direct results from the random choice we made). To avoid this, if its a problem, we could just replace the random pick by picking the "first" unvisited node.