0

I have a set of floating point values that I want to divide into two sets whose size differs at most by one element. Additionally, the difference of value sums between the two sets should be minimal. Optionally, if the number of elements is odd and the sums cannot be equal, the smaller set should have the larger sum.

That would be the optimal solution, but I only really need an exact solution on the subset size constraints. The difference of sums doesn't strictly need to be minimal, but should come close. Also I would prefer if the smaller set (if any) has the larger sum.

I realize this may be related to the partition problem, but it's not quite the same, or as strict.

My current algorithm is the following, though I wonder if there's a way to improve upon that:

arbitrarily divide the set into two sets of the same size (or 1 element size difference)
do
  diffOfSums := sum1 - sum2
  foundBetter := false
  betterDiff := 0.0

  foreach pair of elements from set1 and set2 do
    if |diffOfSums - 2 * betterDiff| > |diffOfSums - 2 * (value1 - value2)| then
      foundBetter := true
      betterDiff := value1 - value2
    endif
  done

  if foundBetter then swap the found elements
while foundBetter

My problem with this approach is that I'm not sure of the actual complexity and whether it can be improved upon. It certainly doesn't fulfill the requirement to leave the smaller subset with a larger sum.

Is there any existing algorithm that happens to do what I want to achieve? And if not, can you suggest ways for me to either improve my algorithm or figure out that it may already be reasonably good for the problem?

Wormbo
  • 4,978
  • 2
  • 21
  • 41

2 Answers2

3

It easy to prove that the partition problem reduces to this problem in polynomial time.

Imagine you want to solve partition for some array A, but you only know how to solve your problem. You just have to double the array length, filling it with zeros. If you can solve it with your algorithm, then you have solved the partition problem. This proves your problem to be NP-hard.

But you'll see you can't reduce this problem to partition (i.e. it isn't NP-complete), unless you limit the precision of your floats. In that case the same algorithm would solve both.

In the general case, the best you can do is backtrack.

Juan Lopes
  • 10,143
  • 2
  • 25
  • 44
  • I see. This tells me I should definitely not be looking for a precise solution. Given that the data I will be using shouldn't be considered absolutely precise either, an approximated solution should suffice for my use case. +1 for pointing out my original problem is likely equivalent to the partitioin problem. – Wormbo Aug 22 '15 at 18:07
2

My suggestion would be to sort the values, then consider each pair of values (v1, v2), (v3, v4) putting one element from each pair into one partition.

The idea is to alternate putting the values into each set, so:

s1 = {v1, v4, v5, v8, . . . }
s2 = {v2, v3, v6, v7, . . . }

If there are an odd number of elements, put the last value into the set that best meets your conditions.

You have a relaxed definition of minimal, so a full search is unnecessary. The above should work quite well for many distributions of the values.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • This doesn't really answer the question. If the set is `[1, 2, 3, 4, 5, 7]`, this algorithm will divide in `s1 = [1, 4, 5]` and `s2 = [2, 3, 7]` while the correct answer is `s1 = [2, 4, 5]`, `s2 = [1, 3, 7]`, – Juan Lopes Aug 22 '15 at 17:14
  • I actually like this approach because it's cheap and may turn out to be a sufficiently good approximation. Even in the case it turns out to not be good enough on its own, it seems like a good starting point for some equally simple randomized post-processing. This is definitely worth looking into, thanks. – Wormbo Aug 22 '15 at 18:16
  • 1
    Yup, this approach produces very good results most of the time, even without any kind of post-processing. Iterating the value in descending order seems to help reducing the error. – Wormbo Aug 23 '15 at 18:47
  • Also, in case of an odd number of values, putting the lowest value in one of the sets at the start gives much better results than putting it in the smaller set at the end. – Wormbo Aug 23 '15 at 19:47