4

The Problem

Given a set of integers, find a subset of those integers which sum to 100,000,000.

Solution

I am attempting to build a tree containing all the combinations of the given set along with the sum. For example, if the given set looked like 0,1,2, I would build the following tree, checking the sum at each node:

                    {}
        {}                      {0}
  {}         {1}         {0}          {0,1}
{}  {2}  {1}   {1,2}  {0}   {2}   {0,1}   {0,1,2}

Since I keep both the array of integers at each node and the sum, I should only need the bottom (current) level of the tree in memory.

Issues

My current implementation will maintain the entire tree in memory and therefore uses way too much heap space.

How can I change my current implementation so that the GC will take care of my upper tree levels?

(At the moment I am just throwing a RuntimeException when I have found the target sum but this is obviously just for playing around)

public class RecursiveSolver {
    static final int target = 100000000;
    static final int[] set = new int[]{98374328, 234234123, 2341234, 123412344, etc...};

    Tree initTree() {
        return nextLevel(new Tree(null), 0);
    }

    Tree nextLevel(Tree currentLocation, int current) {
        if (current == set.length) { return null; }
        else if (currentLocation.sum == target) throw new RuntimeException(currentLocation.getText());
        else {
            currentLocation.left = nextLevel(currentLocation.copy(), current + 1);
            Tree right = currentLocation.copy();
            right.value = add(currentLocation.value, set[current]);
            right.sum = currentLocation.sum + set[current];
            currentLocation.right = nextLevel(right, current + 1);
            return currentLocation;
        }
    }

    int[] add(int[] array, int digit) {
        if (array == null) {
            return new int[]{digit};
        }
        int[] newValue = new int[array.length + 1];
        for (int i = 0; i < array.length; i++) {
            newValue[i] = array[i];
        }
        newValue[array.length] = digit;
        return newValue;
    }

    public static void main(String[] args) {
        RecursiveSolver rs = new RecursiveSolver();
        Tree subsetTree = rs.initTree();
    }
}

class Tree {
    Tree left;
    Tree right;
    int[] value;
    int sum;

    Tree(int[] value) {
        left = null;
        right = null;
        sum = 0;
        this.value = value;
        if (value != null) {
            for (int i = 0; i < value.length; i++) sum += value[i];
        }
    }

    Tree copy() {
        return new Tree(this.value);
    }
}
Community
  • 1
  • 1
jimpudar
  • 309
  • 2
  • 9
  • _The_ subset or _a_ subset? – erip Jul 16 '16 at 19:12
  • Good point - _a_ subset – jimpudar Jul 16 '16 at 19:15
  • 2
    This is actually quite a famous problem: [the subset sum problem](https://en.wikipedia.org/wiki/Subset_sum_problem). If you don't want to use a tree, there's a good DP solution. – erip Jul 16 '16 at 19:16
  • I am well aware of its fame, and have been exploring several different solutions. In this question, I am not asking how to solve the problem but how to improve the space complexity of my existing algorithm. – jimpudar Jul 16 '16 at 19:18
  • To address your titular question: there's no way with this approach. You're creating every permutation, which is `O(2^n)` permutations. Since you're storing each permutation as a node, that's `O(2^n)` space. – erip Jul 16 '16 at 19:23
  • Forgive me for being unclear. The way I see it, I don't need to store any of the nodes above the bottom most level at any given time. Maybe what I am really asking here is, what is the best way to make the recursive call _tail recursive_ – jimpudar Jul 16 '16 at 19:33
  • Perhaps a tree isn't the best structure for the job. You can generate the permutations tail recursively. Unfortunately Java doesn't have an asynchronous return (e.g., `yield`), but if you can do it in Python or Scala, you could leverage this for early stopping. – erip Jul 16 '16 at 19:37
  • Not trying for the best implementation here, just experimenting. By the way, with a large target value like 100000000, the DP solution is actually usually slower than a targeted brute force approach. – jimpudar Jul 16 '16 at 19:44
  • 2
    You're asking for advice about how to improve performance, but you're totally ignoring my advice to not use a tree... best of luck. – erip Jul 16 '16 at 19:46
  • You are very right here, thanks for the comments. While I was visualizing the algorithm as a tree, I didn't realize there was no need to actually store that tree. I have added an answer to reflect that. – jimpudar Jul 16 '16 at 22:08
  • It seems that build the tree ( keeping it in memory or not ) it's not the best way because the non-leave nodes are useless. If you use a boolean array to represent witch number are you adding ( 1001 : you add the first and the last number ) you only need to iterate all options. Furthermore, 1001 is the number 9 and the next combination is 1010 witch is 10 (9+1). --> for( i in 0 to n ) { selected = toBinArray(i); ... } – Daniel Argüelles Jul 18 '16 at 15:19
  • Do you have the actual set of numbers? I want to try and find how long my DP solution takes to solve this problem. – Kedar Mhaswade Aug 09 '16 at 17:34
  • Sure @KedarMhaswade, you can find the original problem here: http://opengarden.com/jobs/ My best time for DP solution was more than a couple of seconds, whereas my meet in the middle attack completes this in under 100ms – jimpudar Aug 09 '16 at 17:35

3 Answers3

1

The problem is NP-complete.

If you really want to improve performance, then you have to forget about your tree implementation. You either have to just generate all the subsets and sum them up or to use dynamic programming.

The choice depends on the number of elements to sum and the sum you want to achieve. You know the sum it is 100,000,000, bruteforce exponential algorithm runs in O(2^n * n) time, so for number below 22 it makes sense.

In python you can achieve this with a simple:

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

You can significantly improve this complexity (sacrificing the memory) by using meet in the middle technique (read the wiki article). This will decrease it to O(2^(n/2)), which means that it will perform better than DP solution for n <~ 53

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
1

The time and space you need for building the tree here is absolutely nothing at all.

The reason is because, if you're given

  • A node of the tree
  • The depth of the node
  • The ordered array of input elements

you can simply compute its parent, left, and right children nodes using O(1) operations. And you have access to each of those things while you're traversing the tree, so you don't need anything else.

  • Although this doesn't answer the question about implementation, this is definitely the point which I was missing in my question. Especially since I don't even need its parent or children nodes; all I need to do is find the correct node. – jimpudar Jul 16 '16 at 22:43
  • 1
    There is no such thing as `O(0)` – Salvador Dali Jul 16 '16 at 23:20
  • @SalvadorDali: Sure there is. But I suppose I really mean to say "always zero time" rather than "zero time for any sufficiently large problem", so I'll make the correction. –  Jul 16 '16 at 23:33
  • @Hurkyl Those are, indeed, the same by the definition of big Oh applied to `g(x) = 0`. ;) – erip Jul 16 '16 at 23:48
0

After thinking more about erip's comments, I realized he is correct - I shouldn't be using a tree to implement this algorithm.

Brute force usually is O(n*2^n) because there are n additions for 2^n subsets. Because I only do one addition per node, the solution I came up with is O(2^n) where n is the size of the given set. Also, this algorithm is only O(n) space complexity. Since the number of elements in the original set in my particular problem is small (around 25) O(2^n) complexity is not too much of a problem.

The dynamic solution to this problem is O(t*n) where t is the target sum and n is the number of elements. Because t is very large in my problem, the dynamic solution ends up with a very long runtime and a high memory usage.

This completes my particular solution in around 311 ms on my machine, which is a tremendous improvement over the dynamic programming solutions I have seen for this particular class of problem.

public class TailRecursiveSolver {
    public static void main(String[] args) {
        final long starttime = System.currentTimeMillis();
        try {
            step(new Subset(null, 0), 0);
        }
        catch (RuntimeException ex) {
            System.out.println(ex.getMessage());
            final long endtime = System.currentTimeMillis();
            System.out.println(endtime - starttime);
        }
    }

    static final int target = 100000000;
    static final int[] set = new int[]{ . . . };

    static void step(Subset current, int counter) {
        if (current.sum == target) throw new RuntimeException(current.getText());
        else if (counter == set.length) {}
        else {
            step(new Subset(add(current.subset, set[counter]), current.sum + set[counter]), counter + 1);
            step(current, counter + 1);
        }
    }

    static int[] add(int[] array, int digit) {
        if (array == null) {
            return new int[]{digit};
        }
        int[] newValue = new int[array.length + 1];
        for (int i = 0; i < array.length; i++) {
            newValue[i] = array[i];
        }
        newValue[array.length] = digit;
        return newValue;
    }
}

class Subset {
    int[] subset;
    int sum;

    Subset(int[] subset, int sum) {
        this.subset = subset;
        this.sum = sum;
    }

    public String getText() {
        String ret = "";
        for (int i = 0; i < (subset == null ? 0 : subset.length); i++) {
            ret += " + " + subset[i];
        }
        if (ret.startsWith(" ")) {
            ret = ret.substring(3);
            ret = ret + " = " + sum;
        } else ret = "null";
        return ret;
    }
}

EDIT -

The above code still runs in O(n*2^n) time - since the add method runs in O(n) time. This following code will run in true O(2^n) time, and is MUCH more performant, completing in around 20 ms on my machine.

It is limited to sets less than 64 elements due to storing the current subset as the bits in a long.

public class SubsetSumSolver {
    static boolean found = false;
    static final int target = 100000000;
    static final int[] set = new int[]{ . . . };

    public static void main(String[] args) {
        step(0,0,0);
    }

    static void step(long subset, int sum, int counter) {
        if (sum == target) {
            found = true;
            System.out.println(getText(subset, sum));
        }
        else if (!found && counter != set.length) {
            step(subset + (1 << counter), sum + set[counter], counter + 1);
            step(subset, sum, counter + 1);
        }
    }

    static String getText(long subset, int sum) {
        String ret = "";
        for (int i = 0; i < 64; i++) if((1 & (subset >> i)) == 1) ret += " + " + set[i];
        if (ret.startsWith(" ")) ret = ret.substring(3) + " = " + sum;
        else ret = "null";
        return ret;
    }
}

EDIT 2 -

Here is another version uses a meet in the middle attack, along with a little bit shifting in order to reduce the complexity from O(2^n) to O(2^(n/2)).

If you want to use this for sets with between 32 and 64 elements, you should change the int which represents the current subset in the step function to a long although performance will obviously drastically decrease as the set size increases. If you want to use this for a set with odd number of elements, you should add a 0 to the set to make it even numbered.

import java.util.ArrayList;
import java.util.List;

public class SubsetSumMiddleAttack {
    static final int target = 100000000;
    static final int[] set = new int[]{ ... };

    static List<Subset> evens = new ArrayList<>();
    static List<Subset> odds = new ArrayList<>();

    static int[][] split(int[] superSet) {
        int[][] ret = new int[2][superSet.length / 2]; 

        for (int i = 0; i < superSet.length; i++) ret[i % 2][i / 2] = superSet[i];

        return ret;
    }

    static void step(int[] superSet, List<Subset> accumulator, int subset, int sum, int counter) {
        accumulator.add(new Subset(subset, sum));
        if (counter != superSet.length) {
            step(superSet, accumulator, subset + (1 << counter), sum + superSet[counter], counter + 1);
            step(superSet, accumulator, subset, sum, counter + 1);
        }
    }

    static void printSubset(Subset e, Subset o) {
        String ret = "";
        for (int i = 0; i < 32; i++) {
            if (i % 2 == 0) {
                if ((1 & (e.subset >> (i / 2))) == 1) ret += " + " + set[i];
            }
            else {
                if ((1 & (o.subset >> (i / 2))) == 1) ret += " + " + set[i];
            }
        }
        if (ret.startsWith(" ")) ret = ret.substring(3) + " = " + (e.sum + o.sum);
        System.out.println(ret);
    }

    public static void main(String[] args) {
        int[][] superSets = split(set);

        step(superSets[0], evens, 0,0,0);
        step(superSets[1], odds, 0,0,0);

        for (Subset e : evens) {
            for (Subset o : odds) {
                if (e.sum + o.sum == target) printSubset(e, o);
            }
        }
    }
}

class Subset {
    int subset;
    int sum;

    Subset(int subset, int sum) {
        this.subset = subset;
        this.sum = sum;
    }
}
jimpudar
  • 309
  • 2
  • 9