Python - Group by any matching value

Question

I have a data structure like this

[
  (123, 321),
  (123, 456),
  (999, 654),
  (111, 456),
  (999, 898),
  (111, 654),
  (481, 739),
]

How can I group the tuples together by any matching element? i.e. to get this result (order doesn't matter)

[
  [(123, 321), (123, 456), (111, 456), (111, 654), (999, 654), (999, 898)],
  [(481, 739)],
]

Here's another example:

input = 
  [
    (123, 321),
    (123, 456),
    (111, 456),
    (999, 898),
    (481, 898),
    (481, 549),
  ]

output = 
  [
    [(123, 321), (123, 456), (111, 456)],
    [(999, 898), (481, 898), (481, 549)],
  ]

The example is a little unclear. Do you wish to create two lists, one with tuples whose values occur more than once and the other with tuples whose values only occur once? — Prins, Aug 25 '22 at 11:44
Not quite. I've updated the question with another example. I want to group tuples together where either of the values match the corresponding element in the other tuple. — Mark, Aug 25 '22 at 12:00
I posted my answer before I saw your update. The new example has actually made it more unclear for me. How are you deciding what goes into the first list and what goes into the second list e.g. [481,898] has values that occur more than once, why is this in the second list whilst [123,321] is in the first list? — Prins, Aug 25 '22 at 12:09
Everything in the first list has at least 1 matching value. Tuples 1 and 2 match on index 0, tuples 2 and 3 match on index 1. Similarly, in the second list, tuples 1 and 2 match on index 1, tuples 2 and 3 match on index 0. Therefore grouping together tuples with at least 1 common element. — Mark, Aug 25 '22 at 12:16
Sorry, this is still a little unclear. You said (123, 321) and (123, 456) match on index 0 so are in the first list. The tuples (481, 898) and (481, 549) also match on index 0 but are in the second list. Why? — Prins, Aug 25 '22 at 12:29
We are grouping tuples by a common value in either index. Every tuple in the first list has at least 1 common value with every other tuple in the first list. Every tuple in the second list has at least 1 common value with every other tuple in the second list. Tuples in the first list have no common values with tuples in the second list. — Mark, Aug 25 '22 at 12:32
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/247585/discussion-between-prins-and-mark). — Prins, Aug 26 '22 at 12:45

score 0 · Answer 1 · answered Aug 25 '22 at 12:03

If I have understood the question correctly then you are trying to generate a list that contains two lists as follows:

The first list contains all those tuples where a value in the tuple occurs in any other tuple.
The second list contains all those tuples where the values of the tuple do not occur in any other tuple.

If the above is the case then effectively which are looking to do is determine which values occur more than once and use this information to generate your lists.

One approach is to use Python's Counter class to count up the occurrences of each value. You can do this as follows:

from collections import Counter
occurrences = Counter(e[0] for e in data) + Counter(e[1] for e in data)

This will give you the following:

Counter({123: 2, 999: 2, 111: 2, 456: 2, 654: 2, 481: 1, 321: 1, 898: 1, 739: 1})

Next, you can iterate through your list of tuples and separate them based on whether the values in the tuple occur more than once or not:

grouped = [[], []]

for entry in data:
  if occurrences[entry[0]] > 1 or occurrences[entry[1]] > 1:
    grouped[0].append(entry)
  else:
    grouped[1].append(entry)

This will then give you the desired result:

[
  [(123, 321), (123, 456), (999, 654), (111, 456), (999, 898), (111, 654)],
  [(481, 739)]
]

Thanks but this isn't quite what I'm looking for. See the updated example in the original question. I realise the initial example was a bit unclear. — Mark, Aug 25 '22 at 12:06

score 0 · Answer 2 · answered Aug 25 '22 at 17:58

Following further clarification, the problem can be solved as follows:

Step 1: Start by preparing your data by storing it in a list containing lists of tuples. Each inner list only contains a single tuple. This is the initial case i.e. no groupings.
Step 2: Make a first pass and group the data. This will result in your initial groupings.
Step 3: Keep passing through the data and combining groups until these can no longer be combined.

We can implement these steps as follows:

Step 1: We will use a list comprehension to prepare the data.

# prepare the data by turning it into a list of lists containing tuples
prepared = [[tup] for tup in data]

Step 2: We will create a group function to generate the groups.

def group(entries):
  groups = []
  
  for entry in entries:
    i = existing_group(entry, groups)
    
    # add to existing group or create a new group
    if i == -1:
        groups.append(entry)
    else:
      groups[i] = groups[i] + entry
      
  return groups

The group function simply iterates each entry in our data and checks if it belongs in an existing or new group. We can write an existing_group function to help determine this.

def existing_group(entry, groups):
  for i, group in enumerate(groups):
    
    # flatten
    group_values = sum(group, ())
    entry_values = sum(entry, ())
    
    # find one or more matching values
    if any(value in group_values for value in entry_values):
      return i
      
  return -1

Thus, we can call our group function as follows:

# intiai grouping of data
groups = group(prepared)

Step 3: We can use a loop to keep grouping the data until the groups cannot be changed any further.

# keep grouping the data until the groups are unchanged
num_groups = 0
  
while num_groups != len(groups):
  num_groups = len(groups)
  groups = group(groups)

This will then give the desired result.

Final Solution: The resulting solution is as follows.

def existing_group(entry, groups):
  for i, group in enumerate(groups):
    group_values = sum(group, ())
    entry_values = sum(entry, ())
    if any(value in group_values for value in entry_values):
      return i  
  return -1


def group(entries):
  groups = []
  for entry in entries:
    i = existing_group(entry, groups)
    if i == -1:
        groups.append(entry)
    else:
      groups[i] = groups[i] + entry
  return groups
  


prepared = [[tup] for tup in data]
groups = group(prepared)
num_groups = 0

while num_groups != len(groups):
  num_groups = len(groups)
  groups = group(groups)

print(groups)

Python - Group by any matching value

2 Answers2