0

My rules:

  • Duplicates are allowed
  • negative numbers are allowed obviously
  • Since i mentioned partition, it means you cannot put an element from the array in more than 1 partition
    • Elements in partition are subsets/need not be contiguous chunks of array
    • elements in input array are not in sorted order
    • Sum of all the elements in input array will aways be 0(given condition)

Example: if A = {-2,-4,-6,+6,+6} then B={{-2,-4,6},{+6,-6}} is the result with the max no of partitions

FYI, I want to return all the partitions and not the number of partitions.

From my research, this seemed like a NP-hard/complete problem. But I'm not sure about it and would really appreciate if someone could explain the best way to go about it(even if its slow). Some psuedo-code would be appreciated too.

Thank you.

nishant_boro
  • 374
  • 1
  • 2
  • 8

2 Answers2

2

This definitely has an NP-hard feel to it. In particular the question of whether it is possible to do 2 partitions vs 1 is a question of whether a proper subset of all the elements except the last add up to the negative of the last, which is a variant of the subset sum problem. So even verifying that the answer can't be improved further by splitting one of the partitions is probably NP-complete!

But how would one solve this in practice?

Step 1 is produce a data structure representing all possible partitions including the first element that sum to 0. This can be solved as in the standard subset sum algorithm. With the idea of a doubly linked list from which we can get the information. (The doubly linked list will generally be proportional to the size of your array times the number of distinct sums found, but when decoded it can result in an exponential number of partitions.

Doubly linked lists can make your head spin, so you probably want to use syntactic sugar as follows:

from collections import namedtuple
Node = namedtuple('Node', ['ind', 'prev'])
Paths = namedtuple('Paths', ['node', 'prev'])

Where ind is an index into your array, node is always a Node, and prev is always a Paths or None.

And now a Node says "include this index and any valid path in the following previous choices". And a Paths says "here is a node that leads here, and a previous paths for other ways."

With this we can do the following:

# We only want paths that take the 0th index.
ways_to_get_to_n = {elements[0], Paths(Node(0, None), None)}

for i in range(1, len(elements)):
    element = elements[i]
    new_ways_to_get_n = {}
    for value, ways in ways_to_get_n.items():
        new_ways_to_get_n[value] = ways
    for value, ways in ways_to_get_n.items():
        if value + element not in new_ways_to_get_n:
            new_ways_to_get_n[value + element] = Paths(Node(i, ways), None)
        else:
            new_ways_to_get_n[value + element] = Paths(Node(i, ways), new_ways_to_get_n[value + element])
    ways_to_get_n = new_ways_to_get_n

When you are done, ways_to_get_n[0] is a fairly concise data structure that can be used with a double recursion to walk through all partitions that include the first element. However there is a challenge. These partitions may have 0 partitions inside of them. So as you walk down the double recursion, carry with it a data structure of all values you can reach (the same old subset sum trick), and terminate early whenever 0 shows up. (This bookkeeping may feel like a lot of extra work, but it actually will save you a lot more.)

And now you are able to recursively find minimal partitions with the first element. Then recursively look for how to partition the remaining elements. Every time you do, you compare to the best you currently have, and if it is an improvement, record that.

When you've gone through every way to decompose it, you're done.

Assuming arrays of integers (so subset sum pseudopolynomial is likely to work in your favor) this will allow you to fairly efficiently do recursion on minimal partitions. Which is a better recursive explosion than a naive approach. But it will be a lot bigger than you think.

I suspect that sorting the array in increasing absolute value as a first step will make this algorithm more efficient by making the "filter for irreducible partition" likely to break out of the recursion early when you still have a lot of elements.

btilly
  • 43,296
  • 3
  • 59
  • 88
  • This is another good attack and discourse. Note that the problem does not state that the input list will be exhausted: the is no guarantee that any particular integer will be included in the optimal solution, or any solution. I'm not correcting, just highlighting a possible implementation assumption that isn't valid. – Prune Oct 28 '19 at 22:38
  • @Prune Actually there is a guarantee. The statement that the sum of the original array is 0 means that making all the integers not part of a proposed answer into an additional partition improves that answer. Therefore the optimal solution must include every integer. – btilly Oct 28 '19 at 22:52
  • Ah, right! Never mind the comment, expect for the first sentence. :-) – Prune Oct 28 '19 at 23:11
  • Nice solution. But I'm not sure if this is the fastest way. I saw this other post about using Bin packing and partition problem Any idea on how to use those? – nishant_boro Oct 29 '19 at 03:02
1

I agree that the problem is NP. Optimizing this is ugly, with a capital "ugh". You can improve the search time somewhat, but I fear that it's still O(2^N) and worse.

  • Partition the list into positive and negative numbers; sort both lists.
  • For each element of the shorter list, look for its complement in the other. Each such pair is a partition; these can be safely put on the solution list and removed from further processing.

Now comes the ugly parts. The "relative" brute-force approach is to generate the power set of each partition; index by sum. For instance, given the list 2, 3, 4, 7:

 2  [2]
 3  [3]
 4  [4]
 5  [2, 3]
 6  [2, 4]
 7  [7] [3, 4]
 9  [2, 7] [2, 3, 4]
10  [3, 7]
11  [4, 7]
12  [2, 3, 7]
13  [2, 4, 7]
14  [3, 4, 7]
16  [2, 3, 4, 7]

Now find all matches of abs(sum(subset)) between the positive and negative listings. These form the choices for your solution space. My best recommendation from this point is to apply the set coverage problem to this; you'll have to tweak slightly for duplicate values.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • Going for quick and easy partitions is a greedy algorithm. It will get good solutions fairly quickly, but not guaranteed best partitions. Yes, "best" vs "good" makes this a much harder problem! – btilly Oct 28 '19 at 22:22
  • I've removed *only* pairs of the form (i, -i). I believe that this is safe for finding optimal solutions. – Prune Oct 28 '19 at 22:23
  • Ah. I'm uncertain, but can't easily come up with a counter-example. – btilly Oct 28 '19 at 22:33