How to divide a list of negative and positive numbers into the largest number of subsets whose sum is 0?

Question

I am trying to solve this problem but I can't manage to figure out how.

Let's suppose I have a list of positive and negative numbers whose sum is guaranteed to be 0.

[-10, 1, 2, 20, 5, -100, -80, 10, 15, 15, 60, 100, -20, -18]

I want to obtain a list with the largest number of sub-sets, using all the elements of the initial list only once. And each subset must have the sum 0.

So in the case of this simple input:

[-5, -4, 5, 2, 3, -1]

The best results that can be obtained are:

1. [[-5, 5], [[-4, -1, 2, 3]]]   #2 subsets
2. [[-5, 2, 3], [-4, -1, 5]]     #2 subsets

These, for example, would be totally wrong answers:

1. [[-5, -4, -1, 2, 3, 5]]       #1 subsets that is the initial list, NO
2. [[-5,5]]                      #1 subset, and not all elements are used, NO

Even if it's NP-Complete, how can I manage to solve it even with a brute-force approach? I just need a solution for small list of numbers.

aeternalis1 · Accepted Answer · 2020-03-22T19:01:23.903

def get_subsets(lst):
    N = len(lst)
    cand = []
    dp = [0 for x in range(1<<N)]   # maximum number of subsets using elements represented by bitset
    back = [0 for x in range(1<<N)]

    # Section 1
    for i in range(1,1<<N):
        cur = 0
        for j in range(N):
            if i&(1<<j):
                cur += lst[j]
        if not cur:
            cand.append(i)     # if subset sums to 0, it's viable
    dp[0] = 1

    # Section 2
    for i in range(1<<N):
        while cand and cand[0] <= i:
            cand.pop(0)
        if not dp[i]:
            continue
        for j in cand:
            if i&j:     # if subsets intersect, it cannot be added
                continue
            if dp[i]+1 > dp[i|j]:
                back[i|j] = j
                dp[i|j] = dp[i]+1

    # Section 3
    ind = dp.index(max(dp))
    res = []

    while back[ind]:
        cur = []
        for i in range(N):
            if back[ind]&(1<<i):
                cur.append(lst[i])
        res.append(cur)
        ind = ind^back[ind]

    return res

print (get_subsets([-5, -4, 5, 2, 3, -1]))

Basically, this solution collects all subsets of the original list that can sum to zero, then attempts to merge as many of them together as possible without colliding. It runs in worst-case O(2^{2N}) time, where N is the length of the list, but it should hit an average case of around O(2^N), since there typically shouldn't be too many subsets summing to 0.

EDIT: I added sections to facilitate explanation of the algorithm

Section 1: I iterate through all possible 2^N-1 nonempty subsets of the original list, and check which of these subsets sum to 0; any viable zero-sum subsets are added to the list cand (represented as an integer in the range [1,2^N-1] with bits set at the indices making up the subset).

Section 2: dp is a dynamic programming table storing the maximum number of subsets summing to 0 that can be formed using the subset represented by the integer i at dp[i]. Initially, all entries of dp are set to 0 except dp[0] = 1, since the empty set has a sum of 0. Then I iterate through each subset from 0 to 2^N-1, and I run through the list of candidate subsets and attempt to merge the two subsets.

Section 3: This is just backtracking to find the answer: while filling in dp, I also kept an array back that stores the most recent subset added to achieve the subset i at back[i]. So I find the subset that maximizes the number of sub-subsets summing to 0 with ind = dp.index(max(dp)), and then I backtrack from there, shrinking the subset by removing the most recently added subset until I finally arrive back to the empty set.

Could you add an explanation of how this works? It looks very interesting, but I can barely follow it. — jirassimok, Mar 22 '20 at 17:54

score 1 · Answer 2 · answered Apr 02 '20 at 19:51

This problem is NP-complete, since it is a combination of two NP-complete problems:

finding a single subset whose sum is 0 is known as the subset sum problem
when you find all the subsets whose sum is 0, you have to solve an exact cover problem with a special condition: you want to maximize the number of subsets.

The following steps will provide a solution:

use dynamic programming to find the subsets whose sum is 0 ('https://en.wikipedia.org/wiki/Subset_sum_problem#Pseudo-polynomial_time_dynamic_programming_solution)
to maximize the number of subsets, one would use D. Knuth's Algorithm X to find the exact cover.

A few remarks:

First, we know that there is an exact cover because the list of numbers has a sum of 0.
Second, we can use only the subsets that are not supersets of any other subset. Because, if A is a superset of X (both sum to 0), A can't be in the cover that has the largest number of subsets. Let A, B, C, ... be the cover with the maximum number of subsets, then we can replace A by X and A\X (it is trivial to see that the sum of A\X elements is 0) and we get the cover X, A\X, B, C, ... that is better.
Third, when we use Algorithm X, all paths in the search tree will lead to a success. Let A, B, C, ... be a path composed of non overlapping subsets having each a sum of 0. Then the complent has also a sum of 0 (which may be a superset of another subset, and then we'll use 2.).

As you see, nothing new here, and I will use only well known techniques/algorithms.

Find the subsets having a sum of `0`.

The algorithm is well known. Here's a Python implementation based on Wikipedia explanations

class Q:
    def __init__(self, values):
        self.len = len(values)
        self.min = sum(e for e in values if e <= 0)
        self.max = sum(e for e in values if e >= 0)
        self._arr = [False] * self.len * (self.max - self.min + 1)

    def __getitem__(self, item):
        index, v = item
        return self._arr[v * self.len + index]

    def __setitem__(self, item, value):
        index, v = item
        self._arr[v * self.len + index] = value


class SubsetSum:
    def __init__(self, values):
        self._values = values
        self._q = Q(values)

    def prepare(self):
        for s in range(self._q.min, self._q.max + 1):
            self._q[0, s] = (self._values[0] == s)
        for i in range(self._q.len):
            self._q[i, 0] = True

        for i in range(1, self._q.len):
            v = self._values[i]
            for s in range(self._q.min, self._q.max + 1):
                self._q[i, s] = (v == s) or self._q[i - 1, s] or self._q[
                    i - 1, s - v]

    def subsets(self, target=0):
        yield from self._subsets(self._q.len - 1, target, [])

    def _subsets(self, i, target, p):
        assert i >= 0
        v = self._values[i]
        c = self._q[i - 1, target]
        b = self._q[i - 1, target - v]
        if i == 0:
            if target == 0:
                if p:
                    yield p
            elif self._q[0, target]:
                yield p + [i]
        else:
            if self._q.min <= target - v <= self._q.max and self._q[
                i - 1, target - v]:
                yield from self._subsets(i - 1, target - v, p + [i])

            if self._q[i - 1, target]:
                yield from self._subsets(i - 1, target, p)

Here's how it works:

arr = [-10, 1, 2, 20, 5, -100, -80, 10, 15, 15, 60, 100, -20, -18]
arr = sorted(arr)
s = SubsetSum(arr)
s.prepare()
subsets0 = list(s.subsets())
print(subsets0)

Output:

[[13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0], [13, 12, 11, 10, 9, 7, 6, 5, 3, 2, 1, 0], [13, 12, 11, 10, 9, 4, 2, 1, 0], [13, 12, 11, 10, 8, 7, 4, 2, 1, 0], [13, 12, 11, 10, 8, 6, 5, 4, 3, 1, 0], [13, 12, 11, 10, 7, 2, 1, 0], [13, 12, 11, 10, 6, 5, 3, 1, 0], [13, 12, 11, 9, 8, 7, 4, 2, 1, 0], [13, 12, 11, 9, 8, 6, 5, 4, 3, 1, 0], [13, 12, 11, 9, 7, 2, 1, 0], [13, 12, 11, 9, 6, 5, 3, 1, 0], [13, 12, 11, 8, 7, 6, 5, 3, 1, 0], [13, 12, 11, 8, 4, 1, 0], [13, 12, 11, 1, 0], [13, 12, 10, 9, 8, 7, 6, 5, 4, 3, 1, 0], [13, 12, 10, 9, 8, 2, 1, 0], [13, 12, 10, 9, 7, 6, 5, 3, 1, 0], [13, 12, 10, 9, 4, 1, 0], [13, 12, 10, 8, 7, 4, 1, 0], [13, 12, 10, 7, 1, 0], [13, 12, 9, 8, 7, 4, 1, 0], [13, 12, 9, 7, 1, 0], [13, 11, 10, 8, 6, 5, 4, 3, 2, 0], [13, 11, 10, 6, 5, 3, 2, 0], [13, 11, 9, 8, 6, 5, 4, 3, 2, 0], [13, 11, 9, 6, 5, 3, 2, 0], [13, 11, 8, 7, 6, 5, 3, 2, 0], [13, 11, 8, 4, 2, 0], [13, 11, 7, 6, 5, 4, 3, 2, 1], [13, 11, 7, 6, 5, 4, 3, 0], [13, 11, 2, 0], [13, 10, 9, 8, 7, 6, 5, 4, 3, 2, 0], [13, 10, 9, 7, 6, 5, 3, 2, 0], [13, 10, 9, 4, 2, 0], [13, 10, 8, 7, 4, 2, 0], [13, 10, 8, 6, 5, 4, 3, 2, 1], [13, 10, 8, 6, 5, 4, 3, 0], [13, 10, 7, 2, 0], [13, 10, 6, 5, 3, 2, 1], [13, 10, 6, 5, 3, 0], [13, 9, 8, 7, 4, 2, 0], [13, 9, 8, 6, 5, 4, 3, 2, 1], [13, 9, 8, 6, 5, 4, 3, 0], [13, 9, 7, 2, 0], [13, 9, 6, 5, 3, 2, 1], [13, 9, 6, 5, 3, 0], [13, 8, 7, 6, 5, 3, 2, 1], [13, 8, 7, 6, 5, 3, 0], [13, 8, 4, 2, 1], [13, 8, 4, 0], [13, 7, 6, 5, 4, 3, 1], [13, 2, 1], [13, 0], [12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1], [12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 0], [12, 11, 10, 9, 8, 2, 0], [12, 11, 10, 9, 7, 6, 5, 3, 2, 1], [12, 11, 10, 9, 7, 6, 5, 3, 0], [12, 11, 10, 9, 4, 2, 1], [12, 11, 10, 9, 4, 0], [12, 11, 10, 8, 7, 4, 2, 1], [12, 11, 10, 8, 7, 4, 0], [12, 11, 10, 8, 6, 5, 4, 3, 1], [12, 11, 10, 7, 2, 1], [12, 11, 10, 7, 0], [12, 11, 10, 6, 5, 3, 1], [12, 11, 9, 8, 7, 4, 2, 1], [12, 11, 9, 8, 7, 4, 0], [12, 11, 9, 8, 6, 5, 4, 3, 1], [12, 11, 9, 7, 2, 1], [12, 11, 9, 7, 0], [12, 11, 9, 6, 5, 3, 1], [12, 11, 8, 7, 6, 5, 3, 1], [12, 11, 8, 4, 1], [12, 11, 1], [12, 10, 9, 8, 7, 6, 5, 4, 3, 1], [12, 10, 9, 8, 2, 1], [12, 10, 9, 8, 0], [12, 10, 9, 7, 6, 5, 3, 1], [12, 10, 9, 4, 1], [12, 10, 8, 7, 4, 1], [12, 10, 7, 1], [12, 9, 8, 7, 4, 1], [12, 9, 7, 1], [11, 10, 8, 6, 5, 4, 3, 2], [11, 10, 6, 5, 3, 2], [11, 9, 8, 6, 5, 4, 3, 2], [11, 9, 6, 5, 3, 2], [11, 8, 7, 6, 5, 3, 2], [11, 8, 4, 2], [11, 7, 6, 5, 4, 3], [11, 2], [10, 9, 8, 7, 6, 5, 4, 3, 2], [10, 9, 7, 6, 5, 3, 2], [10, 9, 4, 2], [10, 8, 7, 4, 2], [10, 8, 6, 5, 4, 3], [10, 7, 2], [10, 6, 5, 3], [9, 8, 7, 4, 2], [9, 8, 6, 5, 4, 3], [9, 7, 2], [9, 6, 5, 3], [8, 7, 6, 5, 3], [8, 4]]

Reduce the number of subsets

We have 105 subsets that sum to 0, but we can remove the subsets that are superset of other subsets. We need a function to find if a list of elements contains all elements in another list. In Python:

import collections

def contains(l1, l2):
    """
    Does l1 contain all elements of l2?
    """
    c = collections.Counter(l1)
    for e in l2:
        c[e] -= 1
    return all(n >= 0 for n in c.values())

Now, we can remove the subsets that are supersets of another subset.

def remove_supersets(subsets):
    subsets = sorted(subsets, key=len)
    new_subsets = []
    for i, s1 in enumerate(subsets):
        for s2 in subsets[:i]: # smaller subsets
            if contains(s1, s2):
                break
        else:  # not a superset
            new_subsets.append(s1)
    return new_subsets

In our situation:

subsets0 = remove_supersets(subsets0)
print(len(subsets0))

Output:

[[13, 0], [11, 2], [8, 4], [13, 2, 1], [12, 11, 1], [10, 7, 2], [9, 7, 2], [12, 10, 7, 1], [12, 9, 7, 1], [10, 9, 4, 2], [10, 6, 5, 3], [9, 6, 5, 3], [12, 11, 10, 7, 0], [12, 11, 9, 7, 0], [12, 10, 9, 8, 0], [12, 10, 9, 4, 1], [8, 7, 6, 5, 3], [12, 11, 10, 9, 4, 0], [12, 10, 9, 8, 2, 1], [11, 7, 6, 5, 4, 3], [13, 7, 6, 5, 4, 3, 1]]
[[0, 2, 10, 6, 4], [0, 2, 10, 8, 1], [0, 2, 11, 5, 4], [0, 2, 11, 7, 1], [0, 16, 9, 4], [0, 16, 15, 1], [0, 18, 19], [3, 2, 12, 11], [3, 2, 13, 10], [3, 17, 16], [3, 19, 14], [20, 14, 1]]

We managed to reduce the number of subsets to 21, that is a good improvement since we need to explore all possibilities to find an exact cover.

Algorithm X

I do not use the dancing links here (I think that technique is we'll designed for low level languages like C, but you can implement them in Python if you want). We just need to keep track of the remaing subsets:

class Matrix:
    def __init__(self, subsets, ignore_indices=set()):
        self._subsets = subsets
        self._ignore_indices = ignore_indices

    def subset_values(self, i):
        assert i not in self._ignore_indices
        return self._subsets[i]

    def value_subsets_indices(self, j):
        return [i for i, s in self._subsets_generator() if j in s]

    def _subsets_generator(self):
        return ((i, s) for i, s in enumerate(self._subsets) if
                i not in self._ignore_indices)

    def rarest_value(self):
        c = collections.Counter(
            j for _, s in self._subsets_generator() for j in s)
        return c.most_common()[-1][0]

    def take_subset(self, i):
        s = self._subsets[i]
        to_ignore = {i2 for i2, s2 in self._subsets_generator() if
                     set(s2) & set(s)}
        return Matrix(self._subsets,
                      self._ignore_indices | to_ignore)

    def __bool__(self):
        return bool(list(self._subsets_generator()))

And finally the cover function:

def cover(m, t=[]):
    if m: # m is not empty
        j = m.rarest_value()
        for i in m.value_subsets_indices(j):
            m2 = m.take_subset(i)
            yield from cover(m2, t + [i])
    else:
        yield t

Finally, we have:

m = Matrix(subsets0)
ts = list(cover(m))
t = max(ts, key=len)
print([[arr[j] for j in subsets0[i]] for i in t])

Output:

[[100, -100], [10, -10], [15, 2, 1, -18], [15, 5, -20], [60, 20, -80]]

score 0 · Answer 3 · answered Mar 22 '20 at 18:23

Below essentially the same idea as Michael Huang, with 30 more lines...

A solution with cliques

We can prebuild all subsets whose sum is 0.
- Build subsets of 1 elem;
- then of size 2 by reusing the previous ones
- and keep those whose sum is zero along the way

Now say such subset is a node of a graph. Then a node is in relation with another one iff their associated subset has no number in common.

We thus want to build the maximum clique of the graph:
- In a clique, all nodes are in relation idem their subsets are disjoints
- The maximum clique gives us the maximal number of subsets

function forall (v, reduce) {
  const nexts = v.map((el, i) => ({ v: [el], i, s: el })).reverse()
  while (nexts.length) {
    const next = nexts.pop()
    for (let i = next.i + 1; i < v.length; ++i) {
      const { s, skip } = reduce(next, v[i])
      if (!skip) {
        nexts.push({ v: next.v.concat(v[i]), s: s, i })
      }
    }
  }
}
function buildSubsets (numbers) {
  const sums = []
  forall(numbers, (next, el) => {
    const s = next.s + el
    if (s === 0) {
      sums.push({ s, v: next.v.concat(el) })
      return { s, skip: true }
    }
    return { s }
  })
  return sums
}
const bin2decs = bin => {
  const v = []
  const s = bin.toString(2)
  for (let i = 0; i < s.length; ++i) {
    if (intersects(dec2bin(i), bin)) {
      v.push(i)
    }
  }
  return v
}
const dec2bin = dec => Math.pow(2, dec)
const decs2bin = decs => decs.reduce((bin, dec) => union(dec2bin(dec), bin), 0)
// Set methods on int
const isIn = (a, b) => (a & b) === a
const intersects = (a, b) => a & b
const union = (a, b) => a | b

// if a subset contains another one, discard it
// e.g [1,2,4] should be discarded if [1,2] is present
const cleanSubsets = bins => bins.filter(big => bins.every(b => big === b || !isIn(b, big)))
function bestClique (decs) {
  const cliques = []
  forall(decs, (next, el) => {
    if (intersects(next.s, el)) { return { skip: true } }
    const s = union(next.s, el)
    cliques.push({ s, v: next.v.concat(el) })
    return { s }
  })
  return cliques.sort((a, b) => b.v.length - a.v.length)[0]
}
// in case we have duplicated numbers in the list,
// they are still uniq thanks to their id: i (idem position in the list)
const buildNumbers = v => v.map((n, i) => {
  const u = new Number(n)
  u.i = i
  return u
})
function run (v) {
  const numbers = buildNumbers(v)
  const subs = buildSubsets(numbers)
  const bins = subs.map(s => decs2bin(s.v.map(n => n.i)))
  const clique = bestClique(cleanSubsets(bins))
  const indexedSubs = clique.v.map(bin2decs)
  const subsets = indexedSubs.map(sub => sub.map(i => numbers[i].valueOf()))
  console.log('subsets', JSON.stringify(subsets))
}

run([1, -1, 2, -2])
run([-10, 1, 2, 20, 5, -100, -80, 10, 15, 15, 60, 100, -20, -18, 10, -10])
run([-5, -4, 5, 2, 3, -1])

How to divide a list of negative and positive numbers into the largest number of subsets whose sum is 0?

3 Answers3

Find the subsets having a sum of 0.

Reduce the number of subsets

Algorithm X

Find the subsets having a sum of `0`.