1

I am trying to solve a problem coming from the area of biology in which I have to combine local sub-optimal solutions from each big element such that each sub particle is unique. The problem is that the possibilities could scale up to +4.000 local sub-optimal solutions and to +30.000 for the elements. Cartesian product is not an option as combining the lists is a n*m*p*... problem, impossible without an algorithm beyond itertools.

The general schema is:

[
  [ [a,b,c],[d,e,a],[f], ...],
  [ [f,e,t],[a,b,t],[q], ...],
  [ [a,e,f],[],[p], ... up to 4.000],
  ... up to 30.000
]

[ [a,b,c],[d,e,a],[f],.....], -> group of sub-optimal solutions for elem. #1

I want to find as fast as possible

  • First: one solution, which means a combination of one sub-optimal solution for each element (could include blank lists) such that there are no duplicates. For example [[a,b,c],[f,e,t][p]].

  • Second: all the compatible solutions.

I know is an open questions, however I need some guidance or general algorith to confront this problem, I can investigate further if I have something to start with.

I am using python for the rest of the lab work, however I am open to other languages.

We can start from a basic solver that handle less possibilities in terms of total sub-optimal and number of list of lists.

Best.

EDIT 1

A very-short-real example:

[[[1,2,3][1,2,4],[1,2,5],[5,8]],
[[1,3][7,8],[6,1]],
[[]],
[[9,10][7,5],[6,9],[6,10]]]

OPTIMAL SOLUTION (FROM ROW #):

#1 [1,2,3]
#2 [7,8]
#3 [9,10]

Output: [[1,2,3],[7,8],[9,10]]

Can see here https://pastebin.com/qq4k2FdW

  • Hi, welcome to SO. Could you please provide some small sample input, expected output on that as well as a reproducible code sample in your approach if any? – LazyCoder Aug 01 '19 at 16:17
  • so one row should have no duplicate values ? – basilisk Aug 01 '19 at 16:31
  • LazyCoder, thanks for the fast answer. Find an example now in the body of the answer, and also a optimal result. – Hardin Salvor Aug 01 '19 at 16:32
  • Basilisk, the desired solution has one element from each 'big' list and between those list the condition is that all elements are unique. – Hardin Salvor Aug 01 '19 at 16:33
  • by one element you mean one row(list or array) or number ? I don't see the pattern here sorry, and how is it between those list the condition is that the elemts are unique ? why you didn't use the 6 ? it was unique – basilisk Aug 01 '19 at 16:41
  • @basilisk I think the given output is one of many possible solutions. Each output is a single choice from each row where across that output, it only contains unique elements. So `[[5,8], [6,1], [9, 10]]` would also be a valid solution – C.Nivs Aug 01 '19 at 16:43
  • We should pick one sub list from each row, and there mustn’t be repeated numbers. I took one example, as when we scale the problem we want only one solution, and then if possible all. – Hardin Salvor Aug 01 '19 at 16:45
  • The question still remains, what have you tried so far? Please post any sample code and attempts and where exactly you got stuck – C.Nivs Aug 01 '19 at 16:47
  • I just prove the product(*) from the itertools package. And as it is a generator I evaluate each parcial solution and if the condition is fulfilled, ok, if not, next. I am in the phone now, I can post the code later. However I am stuck in the general approach (Cartesian product) mira than un that exact code. I am asking for the name of the problem and algorithms to solve it. – Hardin Salvor Aug 01 '19 at 16:53

1 Answers1

0

Here are a couple of algorithms. You can do a brute force search over all possible combinations with itertools very simply:

from itertools import product, chain

def get_compatible_solutions(subsolutions):
    for sol in product(*subsolutions):
        if len(set(chain.from_iterable(sol))) == sum(map(len, sol)):
            yield sol

# Test
example = [
    [[1, 2, 3], [1, 2, 4], [1, 2, 5], [5, 8]],
    [[1, 3], [7, 8], [6, 1]],
    [[]],
    [[9, 10], [7, 5], [6, 9], [6, 10]]
]

# Get one solution
print(next(get_compatible_solutions(example)))
# ([1, 2, 3], [7, 8], [], [9, 10])

# Get all solutions
print(*get_compatible_solutions(example), sep='\n')
# ([1, 2, 3], [7, 8], [], [9, 10])
# ([1, 2, 3], [7, 8], [], [6, 9])
# ([1, 2, 3], [7, 8], [], [6, 10])
# ([1, 2, 4], [7, 8], [], [9, 10])
# ([1, 2, 4], [7, 8], [], [6, 9])
# ([1, 2, 4], [7, 8], [], [6, 10])
# ([1, 2, 5], [7, 8], [], [9, 10])
# ([1, 2, 5], [7, 8], [], [6, 9])
# ([1, 2, 5], [7, 8], [], [6, 10])
# ([5, 8], [1, 3], [], [9, 10])
# ([5, 8], [1, 3], [], [6, 9])
# ([5, 8], [1, 3], [], [6, 10])
# ([5, 8], [6, 1], [], [9, 10])

Another possibility is to do a recursive search one row at a time. This will explore less candidate solutions than the Cartesian product, as once a suboptimal solution is excluded from a search path no combinations including it will be processed.

def get_compatible_solutions(subsolutions):
    current = [None] * len(subsolutions)
    seen = set()
    yield from _get_compatible_solutions_rec(subsolutions, current, 0, seen)

def _get_compatible_solutions_rec(subsolutions, current, i, seen):
    if i >= len(subsolutions):
        yield tuple(current)
    else:
        for subsol in subsolutions[i]:
            if any(s in seen for s in subsol):
                continue
            seen.update(subsol)
            current[i] = subsol
            yield from _get_compatible_solutions_rec(subsolutions, current, i + 1, seen)
            seen.difference_update(subsol)
jdehesa
  • 58,456
  • 7
  • 77
  • 121
  • I cannot test the code right now, but the brute force solutions is the one I am using, and not working as is n*m, so goes quickly to billions of possibilities. Do you think the recursive one could face this issue? Thanks a lot for the answer. – Hardin Salvor Aug 01 '19 at 17:11
  • @HardinSalvor I run a synthetic example with up to 5 suboptimal solutions per element, each suboptimal solution containing up to 5 different integers in [0, 99], and with 10 items in total. The number of compatible solutions was 675,601, and the first algorithm explored 9,765,625 candidate full solutions and took 15.3s, while the second one explored 1,666,535 candidate partial solutions and took 1.6s. There is an important speedup but not of several orders of magnitude. – jdehesa Aug 02 '19 at 09:16
  • @HardinSalvor What is the usual range for the number of different suboptimal solution elements (i.e. the number of `a`, `b`, `c`...). – jdehesa Aug 02 '19 at 09:22
  • I tested and runs quickly in cases of less than a billion possibilities. However our case is in that crazy range (that’s why I don’t know if combinatory is the answer). The actual range of each row scale normally to thousands of suboptimal solutions. – Hardin Salvor Aug 02 '19 at 09:51