Given n tuples representing pairs, return a list with connected tuples

Question

Given n tuples, write a function that will return a list with connected values.

Example:

pairs = [(1,62),
    (1,192),
    (1,168),
    (64,449),
    (263,449),      
    (192,289),
    (128,263),
    (128,345),
    (3,10),
    (10,11)
    ]

result:

[[1,62,192,168,289],
 [64,449,263,128,345,449],
 [3,10,11]]

I believe it could be solved using graphs or trees as data structure, creating nodes for each value and and edges for each pair with each tree or graph representing connected values, but I didn't find a solution yet.

What would be the best way to produce in python a result that yields a list of connected values for those pairs?

@thefourtheye, if there is a loop the result would be [1,62,192,168,289, 64,449,263,128,345,449,3,10,11] — Arian Pasquali, Mar 11 '15 at 07:36
You're second component in the result contains `449` twice. Is that intended? — Francis Colas, Mar 11 '15 at 08:52
Possible duplicate of [How to aggregate matching pairs into "connected components" in Python](https://stackoverflow.com/questions/27967093/how-to-aggregate-matching-pairs-into-connected-components-in-python) — qwr, Aug 21 '18 at 20:17

score 8 · Answer 1 · edited May 23 '17 at 10:32

8

You can solve it with Disjoint Set (Union-Find) implementation.

Initialize the structure djs with all of the numbers. Then for each tuple (x,y), call djs.merge(x,y). Now for each number x, create a new set for it iff djs.sameSet(x,)==false for an arbitrary y from each existing set.

Maybe that could help you.

edited May 23 '17 at 10:32

Community

1
1

answered Mar 11 '15 at 07:45

avim

979
10
25

The second step can be to simply extract the disjoint sets as lists and convert them to tuples. – Dan D. Mar 11 '15 at 08:23

score 5 · Answer 2 · answered Nov 08 '19 at 08:59

You also could use networkx as a dependency.

import networkx as nx

pairs = [(1,62),
        (1,192),
        (1,168),
        (64,449),
        (263,449),      
        (192,289),
        (128,263),
        (128,345),
        (3,10),
        (10,11)]


G = nx.Graph()
G.add_edges_from(pairs)
list(nx.connected_components(G))

score 4 · Accepted Answer · answered Mar 11 '15 at 08:24

I didn't know this problem already had a name (thanks avim!), so I went ahead and solved it naively.

This solution is somewhat similar to Eli Rose's. I decided to post it though, because it is a bit more efficient for large lists of pairs, due to the fact that the lists_by_element dictionary keeps track of the list an element is in, allowing us to avoid iterating through all the lists and their items every time we need to add a new item.

Here's the code:

def connected_tuples(pairs):
    # for every element, we keep a reference to the list it belongs to
    lists_by_element = {}

    def make_new_list_for(x, y):
        lists_by_element[x] = lists_by_element[y] = [x, y]

    def add_element_to_list(lst, el):
        lst.append(el)
        lists_by_element[el] = lst

    def merge_lists(lst1, lst2):
        merged_list = lst1 + lst2
        for el in merged_list:
            lists_by_element[el] = merged_list

    for x, y in pairs:
        xList = lists_by_element.get(x)
        yList = lists_by_element.get(y)

        if not xList and not yList:
            make_new_list_for(x, y)

        if xList and not yList:
            add_element_to_list(xList, y)

        if yList and not xList:
            add_element_to_list(yList, x)            

        if xList and yList and xList != yList:
            merge_lists(xList, yList)

    # return the unique lists present in the dictionary
    return set(tuple(l) for l in lists_by_element.values())

And here's how it works: http://ideone.com/tz9t7m

score 3 · Answer 4 · edited Nov 17 '16 at 22:06

Another solution that is more compact than wOlf's but handles merge contrary to Eli's:

def connected_components(pairs):
    components = []
    for a, b in pairs:
        for component in components:
            if a in component:
                for i, other_component in enumerate(components):
                    if b in other_component and other_component != component: # a, and b are already in different components: merge
                        component.extend(other_component)
                        components[i:i+1] = []
                        break # we don't have to look for other components for b
                else: # b wasn't found in any other component
                    if b not in component:
                        component.append(b)
                break # we don't have to look for other components for a
            if b in component: # a wasn't in in the component 
                component.append(a)
                break # we don't have to look further
        else: # neither a nor b were found
            components.append([a, b])
    return components

Notice that I rely on breaking out of loops when I find an element in a component so that I can use the else clause of the loop to handle the case where the elements are not yet in any component (the else is executed if the loop ended without break).

Mehdi · Answer 5 · 2019-09-25T14:42:49.900

I came up with 2 different solutions:

The first one I prefer is about linking each record with a parent. And then of course navigate further in the hierarchy until an element is mapped to itself.

So the code would be:

def build_mapping(input_pairs):
    mapping = {}

    for pair in input_pairs:
        left = pair[0]
        right = pair[1]

        parent_left = None if left not in mapping else mapping[left]
        parent_right = None if right not in mapping else mapping[right]

        if parent_left is None and parent_right is None:
            mapping[left] = left
            mapping[right] = left

            continue

        if parent_left is not None and parent_right is not None:
            if parent_left == parent_right:
                continue

            top_left_parent = mapping[parent_left]
            top_right_parent = mapping[parent_right]
            while top_left_parent != mapping[top_left_parent]:
                mapping[left] = top_left_parent
                top_left_parent = mapping[top_left_parent]

            mapping[top_left_parent] = top_right_parent
            mapping[left] = top_right_parent

            continue 

        if parent_left is None:
            mapping[left] = parent_right
        else:
            mapping[right] = parent_left

    return mapping


def get_groups(input_pairs):
    mapping = build_mapping(input_pairs)

    groups = {}
    for elt, parent in mapping.items():
        if parent not in groups:
            groups[parent] = set()

        groups[parent].add(elt)

    return list(groups.values())

So, with the following input:

groups = get_groups([('A', 'B'), ('A', 'C'), ('D', 'A'), ('E', 'F'), 
                     ('F', 'C'), ('G', 'H'), ('I', 'J'), ('K', 'L'), 
                     ('L', 'M'), ('M', 'N')])

We get:

[{'A', 'B', 'C', 'D', 'E', 'F'}, {'G', 'H'}, {'I', 'J'}, {'K', 'L', 'M', 'N'}]

The second maybe less efficient solution would be:

def get_groups_second_method(input_pairs):
    groups = []

    for pair in input_pairs:
        left = pair[0]
        right = pair[1]

        left_group = None
        right_group = None
        for i in range(0, len(groups)):
            group = groups[i]

            if left in group:
                left_group = (group, i)

            if right in group:
                right_group = (group, i)

        if left_group is not None and right_group is not None:
            merged = right_group[0].union(left_group[0])
            groups[right_group[1]] = merged
            groups.pop(left_group[1])
            continue

        if left_group is None and right_group is None:
            new_group = {left, right}
            groups.append(new_group)
            continue

        if left_group is None:
            right_group[0].add(left)
        else:
            left_group[0].add(right)

    return groups

Eli Rose · Answer 6 · 2015-03-11T10:07:30.973

It seems like you have a graph (in the form of a list of edges) that may not be all in one piece ("connected") and you want to divide it up into pieces ("components").

Once we think about it in the language of graphs, we're mostly done. We can keep a list of all the components we've found this far (these will be sets of nodes) and add a node to the set if its partner is already there. Otherwise, make a new component for this pair.

def graph_components(edges):
    """
    Given a graph as a list of edges, divide the nodes into components.

    Takes a list of pairs of nodes, where the nodes are integers.
    Returns a list of sets of nodes (the components).
    """

    # A list of sets.
    components = []

    for v1, v2 in edges:
        # See if either end of the edge has been seen yet.
        for component in components:
            if v1 in component or v2 in component:
                # Add both vertices -- duplicates will vanish.
                component.add(v1)
                component.add(v2)
                break
        else:
            # If neither vertex is already in a component.
            components.append({v1, v2})

    return components

I've used the weird for ... else construction for the sake of making this one function -- the else gets executed if a break statement was not reached during the for. The inner loop could just as well be a function returning True or False.

EDIT: As Francis Colas points out, this approach is too greedy. Here's a completely different approach (many thanks to Edward Mann for this beautiful DFS implementation).

This approach is based upon constructing a graph, then doing traversals on it until we run out of unvisited nodes. It should run in linear time (O(n) to construct the graph, O(n) to do all the traversals, and I believe O(n) just to do the set difference).

from collections import defaultdict

def dfs(start, graph):
    """
    Does depth-first search, returning a set of all nodes seen.
    Takes: a graph in node --> [neighbors] form.
    """
    visited, worklist = set(), [start]

    while worklist:
        node = worklist.pop()
        if node not in visited:
            visited.add(node)
            # Add all the neighbors to the worklist.
            worklist.extend(graph[node])

    return visited

def graph_components(edges):
    """
    Given a graph as a list of edges, divide the nodes into components.
    Takes a list of pairs of nodes, where the nodes are integers.
    """

    # Construct a graph (mapping node --> [neighbors]) from the edges.
    graph = defaultdict(list)
    nodes = set()

    for v1, v2 in edges:
        nodes.add(v1)
        nodes.add(v2)

        graph[v1].append(v2)
        graph[v2].append(v1)

    # Traverse the graph to find the components.
    components = []

    # We don't care what order we see the nodes in.
    while nodes:
        component = dfs(nodes.pop(), graph)
        components.append(component)

        # Remove this component from the nodes under consideration.
        nodes -= component

    return components

This is exactly the solution I was about to post, except that I didn't use `for..else` but rather a dull boolean flag to indicate whether a new set should be added to the list of sets. Good job! — Frerich Raabe, Mar 11 '15 at 08:05
Well, it's nice and simple but it doesn't merge components (e.g. `[(1, 2), (3, 4), (2, 4)]` will return `[{1, 2, 4}, {3, 4, 2}]` instead of `[{1, 2, 3, 4}]`). — Francis Colas, Mar 11 '15 at 08:38
I've proposed a solution below using list and not sets (but it should be possible with either); it's more complex because we want to handle this case. — Francis Colas, Mar 11 '15 at 08:51
That's true. I did some additional tests and can happen that items appear in different components. — Arian Pasquali, Mar 11 '15 at 09:28

Given n tuples representing pairs, return a list with connected tuples

6 Answers6

Linked