4

I have two lists of lists:

a = [[0, 1, 5], [2], [3], [4], [6, 7], [8, 9, 10, 11], [12], [13], [14], [15]]
b = [[0, 1], [2, 3], [4], [5], [6, 7], [8, 9, 10, 11], [12], [13, 14], [15]]

How can I find the maximum overlap between the values of the lists and build a new list of lists with this maximum overlap. In other words, I'm looking for a function f which maximizes the list sizes by merging lists with overlap.

The desired result of function f for this example would be:

f(a,b) = [[0, 1, 5], [2, 3], [4], [6, 7], [8, 9, 10, 11], [12], [13, 14], [15]] 
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
elcombato
  • 473
  • 1
  • 4
  • 16

2 Answers2

8

You can use a variant of the disjoint-set structure to solve this problem: for each list [a,b,c] you unify a with b and a with c. You do this for both lists and then derive the resulting roots.

Here there is a simply disjunct-set algorithm we can modify:

from collections import defaultdict

def parent(u,mapping):
    if mapping[u] == u:
        return u
    mapping[u] = parent(mapping[u],mapping)
    return mapping[u]

def relation(array,mapping=None):
    if mapping is None:
        mapping = {}

    for e in array:
        if len(e) > 0:
            u = e[0]
            if u not in mapping:
                mapping[u] = u
            for v in e[1:]:
                if v not in mapping:
                    mapping[v] = v
                mapping[parent(u,mapping)] = parent(v,mapping)
    return mapping

def f(a,b):
    mapping = {}
    relation(a,mapping)
    relation(b,mapping)

    results = defaultdict(set)
    for u in mapping.keys():
        results[parent(u,mapping)].add(u)
    return [list(x) for x in results.values()]

(boldface added for the semantical differences with the original union-set algorithm).

This produces:

>>> f(a,b)
[[2, 3], [4], [0, 1, 5], [6, 7], [8, 9, 10, 11], [12], [13, 14], [15]]

The result is not sorted, since we work with a set. Nevertheless, you can easily sort it on the first element of each tuple if you want by altering f to:

def f(a,b):
    mapping = {}
    relation(a,mapping)
    relation(b,mapping)

    results = defaultdict(set)
    for u in mapping.keys():
        results[parent(u,mapping)].add(u)
    return sorted([list(x) for x in results.values()],key=lambda t:t[0])

which produces:

>>> f(a,b)
[[0, 1, 5], [2, 3], [4], [6, 7], [8, 9, 10, 11], [12], [13, 14], [15]]

The nice thing with this solution is that it also works if there is overlap in a or b itself, and you can easily generalize the solution to work with an arbitrary amount of lists (for instance a, b and c).

Community
  • 1
  • 1
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
0

When I understood it right, the following will do it:

[l for l in a if not any(all(x in l2 for x in l) for l2 in b)] + 
[l for l in b if not any(all(x in l2 for x in l) for l2 in a)] + 
[l for l in a if l in b]

The first term yields all lists in a which are not part of lists in b; the second term yields all lists in b which are note part if lists in a; the third term yield all lists, which are both in a and b.

For your example this yields the following result:

[[0, 1, 5], [2, 3], [13, 14], [4], [6, 7], [8, 9, 10, 11], [12], [15]]
phynfo
  • 4,830
  • 1
  • 25
  • 38
  • 1
    This will not always generate the transitive closure I think. Say you have `a = [[1,2],[3,4],[5,6]]` and `b = [[2,3],[4,5]]` the result is `[[1, 2], [3, 4], [5, 6], [2, 3], [4, 5]]` whereas the expected result is `[[1,2,3,4,5,6]]`... – Willem Van Onsem Mar 06 '17 at 14:52