1

Say I have one number 'n' and a table of numbers. I want to choose up to four of the numbers in the table, and the sum of those four will be the closest possible match to n. Given length 'L' of the table, the number of combinations it has to go through is (6*L + 11*L^2 + 6*L^3 + L^4)/24.

ex. Say I have the variable

n = 100

and the set of numbers

t = {86, 23, 19, 8, 42, 12, 49}

Given this list, the closest combination of four to n is 49 + 23 + 19 + 8 = 99.

What is the optimal way of doing this with the least possible number of calculations?

Waffle
  • 190
  • 9
  • 3
    [Knapsack ?](http://en.wikipedia.org/wiki/Knapsack_problem) – Egor Skriptunoff Mar 26 '13 at 19:32
  • 1
    In a normal Knapsack problem the maximum number of items you pick isn't limited, while in your problem there seems to be a limit of four. I'd still use the same approach as for 0/1 knapsack (dynamic programming). With this approach you can solve it in `O(4nL)` is a lot faster as soon as you get more than a few items in t. – Yexo Mar 26 '13 at 19:56
  • 1
    If the exhaustive search algorithm is too slow, try **branch and bound** so you can dismiss swathes of subsets without trying them. Read chapter 13 http://www.statslab.cam.ac.uk/~rrw1/mor/s2010a4.pdf – Colonel Panic Mar 26 '13 at 21:09
  • Problem says up to four numbers. 49 + 42 + 8 = 99. – JackCColeman Aug 01 '13 at 06:34

2 Answers2

2

This looks like a variation of the 'subset sum' (see: http://en.wikipedia.org/wiki/Subset_sum_problem) problem which is known to to be NP complete, so unfortunately most probably there won't be any clever algorithm at all that in the worst-case will run any faster that exponential in the number of items.

In case there are not many items to check (something about 10 or so) you might try a depth first search pruning branches as soon as possible.

If there are a lot more items to check most probably instead of searching for the optimal solution you might better try to find a somewhat good approximation.

mikyra
  • 10,077
  • 1
  • 40
  • 41
0

Assuming all numbers are positive integers, it could be done as Yexo pointed out:

local n = 100
local t = {86, 23, 19, 8, 42, 12, 49}
local max_terms = 4
-- best[subset_size][terms][k] = {abs_diff, expr}
local best = {[0] = {}}
for k = 1, n do best[0][k] = {k, ''} end
for terms = 0, max_terms do best[terms] = best[0] end
for subset_size = 1, #t do
   local new_best = {}
   for terms = subset_size == #t and max_terms or 0, max_terms do
      new_best[terms] = {}
      for k = subset_size == #t and n or 1, n do
         local b0 = best[terms][k]
         local diff = k - t[subset_size]
         local b1 = terms > 0 and (
            diff > 0 and {
               best[terms-1][diff][1],
               best[terms-1][diff][2]..'+'..t[subset_size]
            } or {math.abs(diff), t[subset_size]}
         ) or b0
         new_best[terms][k] = b1[1] < b0[1] and b1 or b0
      end
   end
   best = new_best
end
local expr = best[max_terms][n][2]:match'^%+?(.*)'
print((loadstring or load)('return '..expr)()..' = '..expr)

-- Output
99 = 23+19+8+49
Egor Skriptunoff
  • 23,359
  • 2
  • 34
  • 64
  • 1
    'for k = 1, n do' looks even even less efficient than going through every possible combination. n could be 10 million. – Waffle Mar 26 '13 at 21:11
  • Is there a way to do this where the work taken to calculate the result is not dependent on n, which could be any number? – Waffle Mar 27 '13 at 00:32
  • If the size of the set of numbers is 1000 then my solution takes about 42 billion iterations. I need a solution that can handle that many numbers with the least possible number of calculations to come to it. – Waffle Mar 27 '13 at 12:13
  • @Waffle - that's why it is better to use O(nL) algorithm, not O(L^4). – Egor Skriptunoff Mar 27 '13 at 12:55
  • Where is a link for that algorithm, and will it work for what I'm trying to do? I want to find the closest possible subset of up to four, be it exact or not. – Waffle Mar 27 '13 at 18:32
  • @Waffle - link is in the first comment under your question, section "dynamic programming". My implementation could be improved by not storing intermediate arrays after they are no more needed (lesser memory usage, the same CPU workload). – Egor Skriptunoff Mar 27 '13 at 19:16
  • The knapsack problem is for calculating the maximum. I need to find the combination nearest to a specific value, not nearest to the total value. – Waffle Mar 27 '13 at 21:08
  • @Waffle - That could be done with only minor changes in the algorithm. Did you understand the idea? – Egor Skriptunoff Mar 27 '13 at 21:29
  • The weight of each number is 1, so I don't see how it applies to the knapsack problem more than it does the subset sum problem. If I wanted to find the largest subset of four, I would choose the four largest numbers in the set. Changing the formula to look for a specific number rather than the largest possible number is not as simple. – Waffle Mar 27 '13 at 21:33
  • @Waffle - Look at code in my answer. It finds the closest possible subset of up to four, as you want. Test it. – Egor Skriptunoff Mar 28 '13 at 02:47