Maximum subset sum with two arrays

Question

I am not even sure if this can be done in polynomial time.

Problem:

Given two arrays of real numbers,
A = (a[1], a[2], ..., a[n]), 
B = (b[1], b[2], ..., b[n]),  (b[j] > 0, j = 1, 2, ..., n)
and a number k, find a subset A' of A (A' = (a[i(1)], a[i(2)], ..., a[i(k)])), which contains exactly k elements, such that, (sum a[i(j)])/(sum b[i(j)]) is maximized, where
j = 1, 2, ..., k.

For example, if k == 3, and {a[1], a[5], a[7]} is the result, then

(a[1] + a[5] + a[7])/(b[1] + b[5] + b[7])

should be larger than any other combination. Any clue?

I guess this is NP-Hard by 99.99% chance, but can I ask you where do you see this problem? In all it's really nice question. — Saeed Amiri, Nov 11 '11 at 23:34
Thanks for your reply. I reduced a real load balancing problem to this abstract version. I have spent more than two days on this problem. Now, I also have a feeling that it is NP-hard. — Geni, Nov 11 '11 at 23:45
There are `n` choose `k` possible ratios, so that sets the upper bound on complexity. I was considering a way to pick the largest ratio `a[i]/b[i]` to start, then pick the index that makes the `k=2` case as large as possible. This way you have to compare `n-1` ratios on that step. Then continue by picking the third index. Proving that this will always give the best ratio once you've picked `k` indices may be hard (or it may not be true!), but trying to prove may offer some insight. — JohnPS, Nov 12 '11 at 00:02
Hi John, Thank you very much for your answer. I tried to prove the correctness of your algorithm yesterday, but end up in a counter example. check this A = [10, 2, 1, 0.2], B = [7, 3, 2, 1.34], and k = 3. — Geni, Nov 12 '11 at 00:25

score 3 · Accepted Answer · answered Nov 12 '11 at 00:26

3

Assuming that the entries of B are positive (it sounds as though this special case might be useful to you), there is an O(n^2 log n) algorithm.

Let's first solve the problem of deciding, for a particular t, whether there exists a solution such that

(sum a[i(j)])/(sum b[i(j)]) >= t.

Clearing the denominator, this condition is equivalent to

sum (a[i(j)] - t*b[i(j)]) >= 0.

All we have to do is choose the k largest values of a[i(j)] - t*b[i(j)].

Now, in order to solve the problem when t is unknown, we use a kinetic algorithm. Think of t as being a time variable; we are interested in the evolution of a one-dimensional physical system with n particles having initial positions A and velocities -B. Each particle crosses each other particle at most one time, so the number of events is O(n^2). In between crossings, the optimum of sum (a[i(j)] - t*b[i(j)]) changes linearly, because the same subset of k is optimal.

answered Nov 12 '11 at 00:26

Per

2,594
12
18

I suspect that there's an `O(n^2)` algorithm that uses arrangements of lines and the computational-geometric machinery that has been developed to handle them. – Per Nov 12 '11 at 00:34
+1 Excellent answer. I never saw something like this. I'm curious if this method can be applied to a larger class of problems and what this type of solution is called...or did you pull this out of thin air? – JohnPS Nov 13 '11 at 05:26
@JohnPS Kinetic algorithms are a design technique favored by computational geometers. They're applicable when you have a continuous parameter (interpreted as time) and a "physical system" that changes nicely except at finitely many times. – Per Nov 13 '11 at 14:22

score 2 · Answer 2 · answered Nov 12 '11 at 00:26

2

If B can contain negative numbers, then this is NP-Hard.

Because of the NP-Hardness of this problem:

Given k and array B, is there a subset of size k of B which sums to zero.

The A becomes immaterial in that case.

Of course, from your comment it seems like B must contain positive numbers.

answered Nov 12 '11 at 00:26

user127.0.0.1

1,337
10
18

Hi! Thanks for your reply. Sorry, I forgot to mention that B contains only positive numbers. – Geni Nov 12 '11 at 00:58

Maximum subset sum with two arrays

2 Answers2

Linked