How to divide a set into two sets such that the difference of the average is minimum?

Question

As I understand, it is related to the partition problem.

But I would like to ask a slightly different problem which I don't care about the sum but the average. In this case, it needs to optimize 2 constraints (sum and number of items) at the same time. It seems to be a harder problem and I cannot see any solutions online.

Are there any solutions for this variant? Or how does it relate to the partition problem?

Example:

input X = [1,1,1,1,1,6]
output based on sum: A = [1,1,1,1,1], B=[6]
output based on average: A = [1], B=[1,1,1,1,6]

In your case, what is the constraint on the number of items? — Abhinav Mathur, Jan 28 '21 at 06:56
for example: [1,1,1,1,1,6] in partition pb: the answer should be [1,1,1,1,1], [6] but in this variance, answer should be [1] [1,1,1,1,6] because adding 1 to the second set not just adding value but also adding number of items leading to decrease average. — Imtk, Jan 28 '21 at 07:19

David Eisenstat · Answer 1 · 2021-01-28T20:04:24.300

On some inputs, a modification of the dynamic program for the usual partition problem will give a speedup. We have to classify each partial solution by its count and sum instead of just sum, which slows things down a bit. Python 3 below (note that the use of dictionaries implicitly collapses functionally identical partial solutions):

def children(ab, x):
    a, b = ab
    yield a + [x], b
    yield a, b + [x]


def proper(ab):
    a, b = ab
    return a and b


def avg(lst):
    return sum(lst) / len(lst)


def abs_diff_avg(ab):
    a, b = ab
    return abs(avg(a) - avg(b))


def min_abs_diff_avg(lst):
    solutions = {(0, 0): ([], [])}
    for x in lst:
        solutions = {
            (sum(a), len(a)): (a, b)
            for ab in solutions.values()
            for (a, b) in children(ab, x)
        }
    return min(filter(proper, solutions.values()), key=abs_diff_avg)


print(min_abs_diff_avg([1, 1, 1, 1, 1, 6]))

Could there be a useful way to try guess the average, subtract it from each element and run the partition problem on the sum? — גלעד ברקן, Jan 28 '21 at 20:29
@גלעדברקן certainly you can apply any non-constant affine function you like, but since the states include the count, that won't compress the DP table without another idea. — David Eisenstat, Jan 28 '21 at 21:57

grodzi · Answer 2 · 2021-01-28T13:02:19.243

let S_i the sum of a subset of v of size i

let S be the total sum of v, n the length of v

the err to minimize is

err_i = |avg(S_i) - avg(S-S_i)|
err_i = |S_i/i - (S-S_i)/(n-i)|
err_i = |(nS_i - iS)/(i(n-i))|

algorithm below does:

for all tuple sizes (1,...,n/2) as i
  - for all tuples of size i-1 as t_{i-1}
  - generate all possible tuple of size i from t_{i-1} by adjoining one elem from v
  - track best tuple in regard of err_i

The only cut I found being:

for two tuples of size i having the same sum, keep the one whose last element's index is the smallest

e.g given tuples A, B (where X is some taken element from v)

A: [X,....,X....]
B: [.,X,.....,X..]

keep A because its right-most element has the minimal index

(idea being that at size 3, A will offer the same candidates as B plus some more)

function generateTuples (v, tuples) {
  const nextTuples = new Map()

  for (const [, t] of tuples) {
    for (let l = t.l + 1; l < v.length; ++l) {
      const s = t.s + v[l]
      if (!nextTuples.has(s) || nextTuples.get(s).l > l) {
        const nextTuple = { v: t.v.concat(l), s, l }
        nextTuples.set(s, nextTuple)
      }
    }
  }
  return nextTuples
}

function processV (v) {
  const fErr = (() => {
    const n = v.length
    const S = v.reduce((s, x) => s + x, 0)
    return ({ s: S_i, v }) => {
      const i = v.length
      return Math.abs((n * S_i - i * S) / (i * (n - i)))
    }
  })()

  let tuples = new Map([[0, { v: [], s: 0, l: -1 }]])
  let best = null
  let err = 9e3
  for (let i = 0; i < Math.ceil(v.length / 2); ++i) {
    const nextTuples = generateTuples(v, tuples)
    for (const [, t] of nextTuples) {
      if (fErr(t) <= err) {
        best = t
        err = fErr(t)
      }
    }
    tuples = nextTuples
  }

  const s1Indices = new Set(best.v)
  return {
    sol: v.reduce(([v1, v2], x, i) => {
      (s1Indices.has(i) ? v1 : v2).push(x)
      return [v1, v2]
    }, [[], []]),
    err
  }
}
console.log('best: ', processV([1, 1, 1, 1, 1, 6]))
console.log('best: ', processV([1, 2, 3, 4, 5]))
console.log('best: ', processV([1, 3, 5, 7, 7, 8]))

Thank you for the great solution. I wonder how can we optimize it further? As it seems to relate to the partition problem, we could somehow use that fact to solve this problem. — Imtk, Jan 28 '21 at 14:08

How to divide a set into two sets such that the difference of the average is minimum?

2 Answers2