Subset Sum algorithm efficiency

Question

We have a number of payments (Transaction) that come into our business each day. Each Transaction has an ID and an Amount. We have the requirement to match a number of these transactions to a specific amount. Example:

Transaction    Amount
1              100
2              200
3              300
4              400
5              500

If we wanted to find the transactions that add up to 600 you would have a number of sets (1,2,3),(2,4),(1,5).

I found an algorithm that I have adapted, that works as defined below. For 30 transactions it takes 15ms. But the number of transactions average around 740 and have a maximum close to 6000. Is the a more efficient way to perform this search?

sum_up(TransactionList, remittanceValue, ref MatchedLists);

private static void sum_up(List<Transaction> transactions, decimal target, ref List<List<Transaction>> matchedLists)
{
    sum_up_recursive(transactions, target, new List<Transaction>(), ref matchedLists);
}

private static void sum_up_recursive(List<Transaction> transactions, decimal target, List<Transaction> partial, ref List<List<Transaction>> matchedLists)
{
    decimal s = 0;
    foreach (Transaction x in partial) s += x.Amount;

    if (s == target)
    {
        matchedLists.Add(partial);
    }

    if (s > target)
        return;

    for (int i = 0; i < transactions.Count; i++)
    {
        List<Transaction> remaining = new List<Transaction>();
        Transaction n = new Transaction(0, transactions[i].ID, transactions[i].Amount);
        for (int j = i + 1; j < transactions.Count; j++) remaining.Add(transactions[j]);

        List<Transaction> partial_rec = new List<Transaction>(partial);
        partial_rec.Add(new Transaction(n.MatchNumber, n.ID, n.Amount));
        sum_up_recursive(remaining, target, partial_rec, ref matchedLists);
    }
}

With Transaction defined as:

class Transaction
{
    public int ID;
    public decimal Amount;
    public int MatchNumber;

    public Transaction(int matchNumber, int id, decimal amount)
    {
        ID = id;
        Amount = amount;
        MatchNumber = matchNumber;
    }
}

[Wrong site](http://meta.stackexchange.com/q/165519/299295) I think... — Sinatr, Aug 05 '16 at 12:30
No, all values are unique, we are currently working to narrow down the list that we select from, but it won't probably affect the set that much. — anothershrubery, Aug 05 '16 at 12:59
@Sinatr I think this is the correct area as I am specifically looking at the current C# implementation of an algorithm I have. — anothershrubery, Aug 05 '16 at 13:03
@anothershrubery, [codereview](http://codereview.stackexchange.com/) -you have working code and want to improve it, [programmers](http://programmers.stackexchange.com/) - optimal algorithm (language agnostic or `c#`). Stackoverflow is good if you have a bug (not working code) or run into issues (performance). I am not insisting, but I think you up to better algorithm. Another thing is what you do not explain yours, but it looks like straight one (recursive iteration), which is memory efficient but has poor performance. — Sinatr, Aug 05 '16 at 13:24

score 1 · Answer 1 · answered Aug 05 '16 at 15:41

As already mentioned your problem can be solved by pseudo polynomial algorithm in O(n*G) with n - number of items and G - your targeted sum.

The first part question: is it possible to achieve the targeted sum G. The following pseudo/python code solves it (have no C# on my machine):

def subsum(values, target):
    reached=[False]*(target+1) # initialize as no sums reached at all
    reached[0]=True # with 0 elements we can only achieve the sum=0
    for val in values:
        for s in reversed(xrange(target+1)): #for target, target-1,...,0
            if reached[s] and s+val<=target: # if subsum=s can be reached, that we can add the current value to this sum and build an new sum 
                reached[s+val]=True
    return reached[target]

What is the idea? Let's consider values [1,2,3,6] and target sum 7:

We start with an empty set - the possible sum is obviously 0.
Now we look at the first element 1 and have to options to take or not to take. That leaves as with possible sums {0,1}.
Now looking at the next element 2: leads to possible sets {0,1} (not taking)+{2,3} (taking).
Until now not much difference to your approach, but now for element 3 we have possible sets a. for not taking {0,1,2,3} and b. for taking {3,4,5,6} resulting in {0,1,2,3,4,5,6} as possible sums. The difference to your approach is that there are two way to get to 3 and your recursion will be started twice from that (which is not needed). Calculating basically the same staff over and over again is the problem of your approach and why the proposed algorithm is better.
1. As last step we consider 6 and get {0,1,2,3,4,5,6,7} as possible sums.

But you also need the subset which leads to the targeted sum, for this we just remember which element was taken to achieve the current sub sum. This version returns a subset which results in the target sum or None otherwise:

def subsum(values, target):
    reached=[False]*(target+1)
    val_ids=[-1]*(target+1)
    reached[0]=True # with 0 elements we can only achieve the sum=0

    for (val_id,val) in enumerate(values):
        for s in reversed(xrange(target+1)): #for target, target-1,...,0
            if reached[s] and s+val<=target:
                reached[s+val]=True
                val_ids[s+val]=val_id          

    #reconstruct the subset for target:
    if not reached[target]:
        return None # means not possible
    else:
        result=[]
        current=target
        while current!=0:# search backwards jumping from predecessor to predecessor
           val_id=val_ids[current]
           result.append(val_id)
           current-=values[val_id]
        return result

As an another approach you could use memoization to speed up your current solution remembering for the state (subsum, number_of_elements_not considered) whether it is possible to achieve the target sum. But I would say the standard dynamic programming is a less error prone possibility here.

score 0 · Answer 2 · answered Aug 05 '16 at 12:31

Yes.

I can't provide full code at the moment, but instead of iterating each list of transactions twice until finding matches (O squared), try this concept:

setup a hashtable with the existing transaction amounts as entries, as well as the summation of each set of two transactions assuming each value is made of a max of two transactions (weekend credit card processing).
for each total, reference into the hashtable - the sets of transactions in that slot are the list of matching transactions.

Instead of O^2, you can get it down to 4*O, which would make a noticeable difference in speed.

Good luck!

The value can be made up of more than 2 transactions. There is no limit to the number of transactions therefore I don't expect this to work? — anothershrubery, Aug 05 '16 at 13:01

score 0 · Answer 3 · answered Aug 05 '16 at 13:45

Dynamic programming can solve this problem efficiently: Assume you have n transactions and the max amount of transactions is m. we can solve it just in the complexity of O(nm).

learn it at Knapsack problem. for this problem we can define for pre i transactions the numbers of subset, add up to sum: dp[i][sum]. the equation:

for i 1 to n:
    dp[i][sum] = dp[i - 1][sum - amount_i]

the dp[n][sum] is the numbers of you need, and you need to add some tricks to get what are all the subsets. Blockquote

score 0 · Answer 4 · answered Aug 08 '16 at 08:26

You have a couple of practical assumptions here that would make brute force with smartish branch pruning feasible:

items are unique, hence you wouldn't be getting combinatorial blow up of valid subsets (i.e. (1,1,1,1,1,1,1,1,1,1,1,1,1) adding up to 3)
if the number of resulting feasible sets is still huge, you would run out of memory collecting them before running into total runtime issues.
ordering input ascending would allow for an easy early stop check - if your remaining sum is smaller then the current element, then none of the yet unexamined items could possibly be in a result (as current and subsequent items would only get bigger)
keeping running sums would speed up each step, as you wouldn't be recalculating it over and over again

Here's a bit of code:

public static List<T[]> SubsetSums<T>(T[] items, int target, Func<T, int> amountGetter)
    {
        Stack<T> unusedItems = new Stack<T>(items.OrderByDescending(amountGetter));
        Stack<T> usedItems = new Stack<T>();
        List<T[]> results = new List<T[]>();
        SubsetSumsRec(unusedItems, usedItems, target, results, amountGetter);
        return results;
    }
    public static void SubsetSumsRec<T>(Stack<T> unusedItems, Stack<T> usedItems, int targetSum, List<T[]> results, Func<T,int> amountGetter)
    {
        if (targetSum == 0)
            results.Add(usedItems.ToArray());
        if (targetSum < 0 || unusedItems.Count == 0)
            return;
        var item = unusedItems.Pop();
        int currentAmount = amountGetter(item);
        if (targetSum >= currentAmount)
        {
            // case 1: use current element
            usedItems.Push(item);
            SubsetSumsRec(unusedItems, usedItems, targetSum - currentAmount, results, amountGetter);
            usedItems.Pop();
            // case 2: skip current element
            SubsetSumsRec(unusedItems, usedItems, targetSum, results, amountGetter);
        }
        unusedItems.Push(item);
    }

I've run it against 100k input that yields around 1k results in under 25 millis, so it should be able to handle your 740 case with ease.

Don't know what the issue is, but I have had your exact code running for about 20 mins now and still no result... — anothershrubery, Aug 10 '16 at 16:00
This basically means you are getting way too many results, which would be fairly useless in practice anyway. You might want to stop once you get N results and use those. — public_static_void, Aug 12 '16 at 09:27
Defeats the purpose of the exercise though. If we can't get all results, it's better to get none and therefore we won't use that solution. — anothershrubery, Aug 15 '16 at 08:11
That is right. You can add "If (results.Count > maxresults) return;" line to the recursive method to stop early — public_static_void, Aug 15 '16 at 14:47

Subset Sum algorithm efficiency

4 Answers4