2

I've seen plenty of threads on how to find all combinations that add up to a number with one list, but wanted to know how to expand this such that you can only pick one number at a time, from a list of lists

Question:
You must select 1 number from each list, how do you find all combinations that sum to N?

Given:
3 lists of differing fixed lengths [e.g. l1 will always have 6 values, l2 will always have 10 values, etc]:

l1 = [0.013,0.014,0.015,0.016,0.017,0.018]
l2 = [0.0396,0.0408,0.042,0.0432,0.0444,0.045,0.0468,0.048,0.0492,0.0504]
l3 = [0.0396,0.0408]

Desired Output:
If N = .0954 then the output is [0.015, 0.396, 0.408],[0.015, 0.408, 0.0396].

What I have tried:

output = sum(list(product(l1,l2,l3,l4,l5,l6,l7,l8)))

However this is too intensive as my largest bucket has 34 values, creating too many combinations.

Any help/tips on how to approach this in a more efficient manner would be greatly appreciated!

Curious Student
  • 156
  • 1
  • 8
  • 2
    It can't be too intensive: it doesn't work. You can't sum a list of tuples. – chrslg Nov 22 '22 at 19:50
  • 1
    It's impossible to actually test your scenario if you don't include the rest of the lists. Maybe come up with a smaller example scenario? – Samwise Nov 22 '22 at 19:51
  • 1
    Besides, if it were working, why convert the result of `product` into a list? Why not pass it directly to sum? The whole point of itertools is to provides iterators, not lists. Lists need to be build in memory. Iterators don't. You just iterate them. – chrslg Nov 22 '22 at 19:51
  • @Samwise edited to include a smaller sample size – Curious Student Nov 22 '22 at 19:59
  • 2
    This might help: `answer = list(combo for combo in itertools.product(l1, l2, l8) if math.isclose(sum(combo), target))` – Robᵩ Nov 22 '22 at 19:59
  • @chrslg can you clarify your comment? I don't fully understand – Curious Student Nov 22 '22 at 19:59
  • 1
    worst case complexity is always going to be `O(product of list lengths)`, but there are some heuristics to cut off most obvious mismatches. E.g., you can produce upper and lower bounds from a partial sum using min/max values from remaining lists, and discard a good portion of variants – Marat Nov 22 '22 at 20:01
  • My first comment means that your trial is failing. So I wouldn't call it 'intensive'. My second comment means that you should not transform an iterator into a list. One easy example using an iterator everybody uses: in python you can do this `s=0` `for i in range(10000000000): s+=i`. It will take time, but won't take memory. Because `range` is an iterator. It enumerates all 10 first billions numbers, without actually building in memory a list a thos 10 billions numbers. – chrslg Nov 22 '22 at 20:03
  • But your version, would be as replacing that simple code with: `s=0` `for i in list(range(10000000000)): s+=i`. It does the same computation. But using an awfully big list of 10 billions numbers, that you don't really need. – chrslg Nov 22 '22 at 20:04
  • Likewise, `[k for k in itertools.product(l1,l2,l3,l4,l5,l6,l7,l8) if math.isclose(sum(k),1.05)]` enumerates all combination of 8 numbers taken from each list, without actually build a list of all those combinations. Only, because of the outer `[` `]`, a list of those among them whose sum is 1.05. But coding `[k for k in list(itertools.product(l1,l2,l3,l4,l5,l6,l7,l8)) if math.isclose(sum(k),1.05)]` actually build the huge list of all possible 8-uplets, before filtering it to those whose sum is 1.05. – chrslg Nov 22 '22 at 20:08
  • Btw, both solution are bad ideas. The first one is working, but really really not optimally. The second one would crash your memory if you try to start it (but other than that, it would work if you had terabytes of memory) – chrslg Nov 22 '22 at 20:09
  • The real solution use Branch&Band. At least I would say. But hard to be sure, because this is obviously an assignment or a test or contest of some sort. And assignments, tests, contests, ... are relative to what you are supposed to know. Have you ever heard of Branch&Bound in the lesson associated with this exercise? If yer, that is probably what you are expected to do. If no... well, then it would help to know what are you studying, because brute force won't lead to a practical solution here. – chrslg Nov 22 '22 at 20:11
  • 1
    Also: you may get false results due to floating point math limitations (e.g `0.1 + 0.1 + 0.1` and `0.3` are not exactly equal) – VPfB Nov 22 '22 at 20:13
  • @chrslg haha not an assignment/test if you can believe it - an actual work problem I am facing. Probably should have paid more attention to my math classes – Curious Student Nov 22 '22 at 20:28
  • To be clear, I am not judging or anything. It is not like I need to be convinced. I have no problem replying for assignment/tests questions neither. Just, if it has been an assignment (and is sure looked like it, with your 'student' pseudo and the seemingly scholar example), it would have helped, before replying with a Branch&Bound solution, to know whether you are supposed to know about it. If it is a work problem, then I suppose you are supposed to know about everything (in theory... engineers are supposed to know everything, and to learn the rest if needed). – chrslg Nov 22 '22 at 20:32

3 Answers3

3

My solution

So my attempt with Branch&Bound


def bb(target):
    L=[l1,l2,l3,l4,l5,l6,l7,l8]
    mn=[min(l) for l in L]
    mx=[max(l) for l in L]
    return bbrec([], target, L, mn, mx)
    
eps=1e-9

def bbrec(sofar, target, L, mn, mx):
    if len(L)==0:
        if target<eps and target>-eps: return [sofar]
        return []
    if sum(mn)>target+eps: return []
    if sum(mx)<target-eps: return []
    res=[]
    for x in L[0]:
        res += bbrec(sofar+[x], target-x, L[1:], mn[1:], mx[1:])
    return res

Note that it is clearly not optimized. For example, it might be faster, to avoid list appending, to deal with 8 elements list from the start (for example, for sofar, filled with None slots at the beginning). Or to create an iterator (yielding results when we find some, rather than appending them.

But as is, it is already 40 times faster than brute force method on my generated data (giving the exact same result). Which is something, considering that this is pure python, when brute force can use by beloved itertools (that is python also, of course, but iterations are done faster, since they are done in implementation of itertools, not in python code).

And I must confess brute force was faster than expected. But, yet, still 40 times too slow.

Explanation

General principle of branch and bound is to enumerate all possible solution recursively (reasoning being "there are len(l1) sort of solutions: those containing l1[0], those containing l1[1], ...; and among the first category, there are len(l2) sort of solutions, ..."). Which, so far, is just another implementation of brute force. Except that during recursion, you can't cut whole branches, (whole subset of all candidates) if you know that finding a solution is impossible from where you are.

It is probably clearer with an example, so let's use yours.

bbrec is called with

  • a partial solution (starting with an empty list [], and ending with a list of 8 numbers)
  • a target for the sum of remaining numbers
  • a list of list from which we must take numbers (so at the beginning, your 8 lists. Once we have chosen the 1st number, the 7 remaining lists. Etc)
  • a list of minimum values of those lists (8 numbers at first, being the 8 minimum values)
  • a list of maximum values

It is called at first with ([], target, [l1,...,l8], [min(l1),...,min(l8)], [max(l1),...,max(l8)])

And each call is supposed to choose a number from the first list, and call bbrec recursively to choose the remaining numbers.

The eigth recursive call with be done with sofar a list of 8 numbers (a solution, or candidate). target being what we have to find in the rest. And since there is no rest, it should be 0. L, mn, and mx an empty list. So When we see that we are in this situation (that is len(L)=len(mn)=len(mx)=0 or len(sofar)=8 — any of those 4 criteria are equivalents), we just have to check if the remaining target is 0. If so, then sofar is a solution. If not, then sofar is not a solution.

If we are not in this situation. That is, if there are still numbers to choose for sofar. bbrec just choose the first number, by iterating all possibilites from the first list. And, for each of those, call itself recursively to choose remaining numbers.

But before doing so (and those are the 2 lines that make B&B useful. Otherwise it is just a recursive implementation of the enumeration of all 8-uples for 8 lists), we check if there is at least a chance to find a solution there.

For example, if you are calling bbrec([1,2,3,4], 12, [[1,2,3],[1,2,3], [5,6,7], [8,9,10]], [1,1,5,8], [3,3,7,10]) (note that mn and mx are redundant information. They are just min and max of the lists. But no need to compute those min and max over and over again)

So, if you are calling bbrec like this, that means that you have already chosen 4 numbers, from the 4 first lists. And you need to choose 4 other numbers, from the 4 remaining list that are passed as the 3rd argument.

And the total of the 4 numbers you still have to choose must be 12.

But, you also know that any combination of 4 numbers from the 4 remaining list will sum to a total between 1+1+5+8=15 and 3+3+7+10=23.

So, no need to even bother enumerating all the solutions starting with [1,2,3,4] and continuing with 4 numbers chosen from [1,2,3],[1,2,3], [5,6,7], [8,9,10]. It is a lost cause: none of the remaining 4 numbers with result in a total of 12 anyway (they all will have a total of at least 15).

And that is what explain why this algorithm can beat, with a factor 40, an itertools based solution, by using only naive manipulation of lists, and for loops.

Brute force solution

If you want to compare yourself on your example, the brute force solution (already given in comments)

def brute(target):
    return [k for k in itertools.product(l1,l2,l3,l4,l5,l6,l7,l8) if math.isclose(sum(k), target)]

Generator version

Not really faster. But at least, if the idea is not to build a list of all solutions, but to iterate through them, that version allows to do so (and it is very slightly faster). And since we talked about generator vs lists in comments...

eps=1e-9
def bb(target):
    L=[l1,l2,l3,l4,l5,l6,l7,l8]
    mn=[min(l) for l in L]
    mx=[max(l) for l in L]
    return list(bbit([], target, L, mn, mx))
def bbit(sofar, target, L, mn, mx):
    if len(L)==0:
        if target<eps and target>-eps:
            print(sofar)
            yield sofar
        return
    if sum(mn)>target+eps: return
    if sum(mx)<target-eps: return
    for x in L[0]:
        yield from bbrec(sofar+[x], target-x, L[1:], mn[1:], mx[1:])

Here, I use it just to build a list (so, no advantage from the first version).

But if you wanted to just print solutions, for example, you could

for sol in bbit([], target, L, mn, mx):
   print(sol)

Which would print all solutions, without building any list of solutions.

Example lists

Just for btilly or those who would like to test their method against the same lists I've used, here are the ones I've chosen

l1=list(np.arange(0.013, 0.019, 0.001))
l2=list(np.arange(0.0396, 0.0516, 0.0012))
l3=[0.0396, 0.0498]
l4=list(np.arange(0.02, 0.8, 0.02))
l5=list(np.arange(0.001, 0.020, 0.001))
l6=list(np.arange(0.021, 0.035, 0.001))
l7=list(np.arange(0.058, 0.088, 0.002))
l8=list(np.arange(0.020, 0.040, 0.005))
chrslg
  • 9,023
  • 5
  • 17
  • 31
  • I would test mine against yours but I don't see where the `l1, l2, ...` came from. – btilly Nov 22 '22 at 21:44
  • @btilly just generated them with random `np.arange`. (np.arange, not random, because with random floats, there would never be 2 solutions with the exact same total. And with random ints, it would not be exactly the same problem, since it would avoid the need for tolerance margin when comparing numbers). So I've chosen `l4=list(np.arange(0.02, 0.8, 0.02))` and `l8=list(np.arange(0.020, 0.040, 0.005))` and things like that. Just out of my... my first ideas. I have 19308 working combinations out of 80 millions combinations. – chrslg Nov 22 '22 at 21:50
  • @btilly: oups. In my (now deleted) comment, I was referring to Marat's solution. Yours wasn't posted yet. – chrslg Nov 22 '22 at 21:55
3

Non-recursive solution:

from itertools import accumulate, product
from sys import float_info

def test(lists, target):
    # will return a list of 2-tuples, containing sum and elements making it
    convolutions = [(0,())]
    # lower_bounds[i] - what is the least gain we'll get from remaining lists
    lower_bounds = list(accumulate(map(min, lists[::-1])))[::-1][1:] + [0]
    # upper_bounds[i] - what is the max gain we'll get from remaining lists
    upper_bounds = list(accumulate(map(max, lists[::-1])))[::-1][1:] + [0]
    for l, lower_bound, upper_bound in zip(lists, lower_bounds, upper_bounds):
        convolutions = [
            # update sum and extend the list for viable candidates
            (accumulated + new_element, elements + (new_element,))
            for (accumulated, elements), new_element in product(convolutions, l)
            if lower_bound - float_info.epsilon <= target - accumulated - new_element <= upper_bound +  float_info.epsilon
        ]

    return convolutions

Output of test(lists, target):

[(0.09540000000000001, (0.015, 0.0396, 0.0408)),
 (0.09540000000000001, (0.015, 0.0408, 0.0396))]

This can be further optimized by sorting lists and slicing them based on upper/lower bound using bisect:

from bisect import bisect_left, bisect_right
# ...

convolutions = [
    (partial_sum + new_element, partial_elements + (new_element,))
    for partial_sum, partial_elements in convolutions
    for new_element in l[bisect_left(l, target-upper_bound-partial_sum-float_info.epsilon):bisect_right(l, target-lower_bound-partial_sum+float_info.epsilon)]
]
Marat
  • 15,215
  • 2
  • 39
  • 48
  • 2
    That is really working well. I would have sworn that no non-recursive solution could both work in reasonable time, and be quite readable (it is always possible to convert a recursive algorithm into an iterative one, of course. But not always in a readable one). Just to understand it, what you are doing is replacing a list of partial solution with a list of (less) partial solutions, build over them. I mean, build a list of 5-uple partial solutions, from a list of 4-uple partial solutions... Is that correct? Skipping those that have no chance to lead to a correct 8-uple solutions. – chrslg Nov 22 '22 at 22:09
  • The only advantage my solution have over this one (since this once is faster. It is roughly the same algorithm. But this one rely on some itertools magic), is that list, that use some memory. Could it not be an iterator? So each loop build a generator from the previous generator. Until you reach the 8th loop. And you get a generator (made of 8 nested generators). With not even one combination build before you start iterating that 8th generator. – chrslg Nov 22 '22 at 22:12
  • @chrslg first comment - yes, it iteratively builds less partial solutions. Second comment - it is not a generator, `convolutions` is an explicit list of partial solutions storing 2-tupes of partial sums and elements making it. For illustrative purposes `product` can be replaced with an innter loop, with the same effect – Marat Nov 22 '22 at 22:23
  • @marat this works really well! bisect cut run time down by almost 75% – Curious Student Nov 22 '22 at 22:56
2

And here is a straightforward dynamic programming solution. I build a data structure which has the answer, and then generate the answer from that data structure.

from dataclasses import dataclass
from decimal import Decimal
from typing import Any

@dataclass
class SummationNode:
    value: Decimal
    solution_tail: Any = None
    next_solution: Any = None

    def solutions (self):
        if self.value is None:
            yield []
        else:
            for rest in self.solution_tail.solutions():
                rest.append(self.value)
                yield rest

        if self.next_solution is not None:
            yield from self.next_solution.solutions()


def all_combinations(target, *lists):
    solution_by_total = {
        Decimal(0): SummationNode(None)
    }

    for l in lists:
        old_solution_by_total = solution_by_total
        solution_by_total = {}
        for x_raw in l:
            x = Decimal(str(x_raw)) # Deal with rounding.
            for prev_total, prev_solution in old_solution_by_total.items():
                next_solution = solution_by_total.get(x + prev_total)
                solution_by_total[x + prev_total] = SummationNode(
                    x, prev_solution, next_solution
                    )
    return solution_by_total.get(Decimal(str(target)))

l1 = [0.013,0.014,0.015,0.016,0.017,0.018]
l2 = [0.0396,0.0408,0.042,0.0432,0.0444,0.045,0.0468,0.048,0.0492,0.0504]
l3 = [0.0396,0.0408]
for answer in all_combinations(0.0964, l1, l2, l3).solutions():
    print(answer)

To check that the logic of this matches the others, when rounding errors are fixed, use the following test:

import numpy as np

def arange(start, stop, step):
    return [round(x, 5) for x in list(np.arange(start, stop, step))]

l1=arange(0.013, 0.019, 0.001)
l2=arange(0.0396, 0.0516, 0.0012)
l3=[0.0396, 0.0498]
l4=arange(0.02, 0.8, 0.02)
l5=arange(0.001, 0.020, 0.001)
l6=arange(0.021, 0.035, 0.001)
l7=arange(0.058, 0.088, 0.002)
l8=arange(0.020, 0.040, 0.005)

for answer in all_combinations(0.2716, l1, l2, l3, l4, l5, l6, l7, l8).solutions():
    print([float(x) for x in answer])
btilly
  • 43,296
  • 3
  • 59
  • 88
  • So, using the same lists as I used for mine, this solution converges in 21 seconds (compared to 45 for brute force, but 1.12 for mine, and 0.91 for Marat's). But more importantly, it gives only 320 solutions. All goods (the total is correct). But there are plenty of missing ones, since brute force, Marat's code and mine all give 19308 solutions from the same lists. I so far don't know why. But you are ignoring some valid solutions. – chrslg Nov 22 '22 at 22:01
  • @chrslg With what target? And do you have an example of a missing solution? Also with the lists you gave, I get a solution in 1.45 seconds. Is my computer just that much faster? – btilly Nov 23 '22 at 01:26
  • See at the end of my answer, for the 8 lists I've used. With target `0.2716`. Since there are 19308 solutions, I obviously haven't check them for redundancies or things like that. But I don't see how there could be a redundancy. And all of them to sum up to 0.2716. As your 320 solution does. – chrslg Nov 23 '22 at 07:56
  • As for computing time, that is surprising indeed. The computer on which I ran this is not very fast. But is not very slow neither. I wouldn't expect it to be 15 times slower than another computer. I'll have to recheck those timings. – chrslg Nov 23 '22 at 07:58
  • For example, `[0.018, 0.048, 0.0396, 0.06, 0.006, 0.022, 0.058, 0.02]` is a correct solution (I've manually checked that 0.018∈l1, 0.048∈l2, etc. ; and the sum is 0.2716). And it is not among the 320 solutions I get using your code. – chrslg Nov 23 '22 at 08:07
  • Do you also get 320 solutions when running your code with my lists and targets? – chrslg Nov 23 '22 at 08:07
  • @chrslg Ah, bug found. Not really a bug. I'm converting to Decimal to avoid roundoff error. With `numpy` you get numbers like `0.017999999999999995` instead of `0.018`. That increases the reachable sums and stops your solution from being a solution to me. Round all numbers to 5 digits and try again. – btilly Nov 23 '22 at 21:02
  • That can't be the reason. It is true that floating point (not numpy. Floating points in general. Even if, only for display, python sometimes rounds float numbers. There is no such thing as `0.018` in floating points anyway. Closest float64 is `0.0179999999999999986399767948341832379810512065887451171875`), But my code, as well as the two others, use a tolerance margin that is way bigger than possible numerical error, while being way smaller than the smaller "quantum" difference that can exists between two possible sum. So, no, no way we can accept too many solutions. – chrslg Nov 23 '22 at 21:33
  • That remark would have been correct in the general case: because of the impossibility to use `==` with floats, since `0.1+0.2==0.3` is False, for the same reason, we are forced to compare within error margin, `0.018-ε – chrslg Nov 23 '22 at 21:38
  • Anyway: I gave you one example. `[0.018, 0.048, 0.0396, 0.06, 0.006, 0.022, 0.058, 0.02]` should be a solution. And is not among the solutions of your code. The fact that, indeed, our codes show this solution as `[0.0179999999994, 0.048, ...]` is not related. It just shows that the tolerance margin is needed, indeed. But not that is wrong, since that solution has to be found. – chrslg Nov 23 '22 at 21:41
  • @chrslg It not only can be, but absolutely IS the reason. The problem is that `Decimal(str(...))` has that extra unwanted precision when you pass floats back from numpy, but not when they are simple floats. I just added how to test my code without the rounding issues, and I verified it finds the exact same solutions. On my machine it ran twice as fast, despite using Decimal objects everywhere. – btilly Nov 24 '22 at 01:03