4

I am not even sure if this can be done in polynomial time.

Problem:

Given two arrays of real numbers,

A = (a[1], a[2], ..., a[n]), 
B = (b[1], b[2], ..., b[n]),  (b[j] > 0, j = 1, 2, ..., n)

and a number k, find a subset A' of A (A' = (a[i(1)], a[i(2)], ..., a[i(k)])), which contains exactly k elements, such that, (sum a[i(j)])/(sum b[i(j)]) is maximized, where
j = 1, 2, ..., k.

For example, if k == 3, and {a[1], a[5], a[7]} is the result, then

(a[1] + a[5] + a[7])/(b[1] + b[5] + b[7])

should be larger than any other combination. Any clue?

Geni
  • 687
  • 3
  • 10
  • 22
  • I guess this is NP-Hard by 99.99% chance, but can I ask you where do you see this problem? In all it's really nice question. – Saeed Amiri Nov 11 '11 at 23:34
  • Thanks for your reply. I reduced a real load balancing problem to this abstract version. I have spent more than two days on this problem. Now, I also have a feeling that it is NP-hard. – Geni Nov 11 '11 at 23:45
  • There are `n` choose `k` possible ratios, so that sets the upper bound on complexity. I was considering a way to pick the largest ratio `a[i]/b[i]` to start, then pick the index that makes the `k=2` case as large as possible. This way you have to compare `n-1` ratios on that step. Then continue by picking the third index. Proving that this will always give the best ratio once you've picked `k` indices may be hard (or it may not be true!), but trying to prove may offer some insight. – JohnPS Nov 12 '11 at 00:02
  • Hi John, Thank you very much for your answer. I tried to prove the correctness of your algorithm yesterday, but end up in a counter example. check this A = [10, 2, 1, 0.2], B = [7, 3, 2, 1.34], and k = 3. – Geni Nov 12 '11 at 00:25
  • Can a[i] and b[i] be zero or negative? – rettvest Nov 12 '11 at 00:30
  • @Geni - I suspected a counterexample like this. – JohnPS Nov 12 '11 at 00:36

2 Answers2

3

Assuming that the entries of B are positive (it sounds as though this special case might be useful to you), there is an O(n^2 log n) algorithm.

Let's first solve the problem of deciding, for a particular t, whether there exists a solution such that

(sum a[i(j)])/(sum b[i(j)]) >= t.

Clearing the denominator, this condition is equivalent to

sum (a[i(j)] - t*b[i(j)]) >= 0.

All we have to do is choose the k largest values of a[i(j)] - t*b[i(j)].

Now, in order to solve the problem when t is unknown, we use a kinetic algorithm. Think of t as being a time variable; we are interested in the evolution of a one-dimensional physical system with n particles having initial positions A and velocities -B. Each particle crosses each other particle at most one time, so the number of events is O(n^2). In between crossings, the optimum of sum (a[i(j)] - t*b[i(j)]) changes linearly, because the same subset of k is optimal.

Per
  • 2,594
  • 12
  • 18
  • I suspect that there's an `O(n^2)` algorithm that uses arrangements of lines and the computational-geometric machinery that has been developed to handle them. – Per Nov 12 '11 at 00:34
  • +1 Excellent answer. I never saw something like this. I'm curious if this method can be applied to a larger class of problems and what this type of solution is called...or did you pull this out of thin air? – JohnPS Nov 13 '11 at 05:26
  • @JohnPS Kinetic algorithms are a design technique favored by computational geometers. They're applicable when you have a continuous parameter (interpreted as time) and a "physical system" that changes nicely except at finitely many times. – Per Nov 13 '11 at 14:22
2

If B can contain negative numbers, then this is NP-Hard.

Because of the NP-Hardness of this problem:

Given k and array B, is there a subset of size k of B which sums to zero.

The A becomes immaterial in that case.

Of course, from your comment it seems like B must contain positive numbers.

user127.0.0.1
  • 1,337
  • 10
  • 18
  • Hi! Thanks for your reply. Sorry, I forgot to mention that B contains only positive numbers. – Geni Nov 12 '11 at 00:58