Recovering Subsets in Subset Sum Problem - Not All Subsets Appear

Question

Brushing up on dynamic programming (DP) when I came across this problem. I managed to use DP to determine how many solutions there are in the subset sum problem.

def SetSum(num_set, num_sum):

   #Initialize DP matrix with base cases set to 1
   matrix = [[0 for i in range(0, num_sum+1)] for j in range(0, len(num_set)+1)]
   for i in range(len(num_set)+1): matrix[i][0] = 1

   for i in range(1, len(num_set)+1): #Iterate through set elements
       for j in range(1, num_sum+1):   #Iterate through sum
           if num_set[i-1] > j:    #When current element is greater than sum take the previous solution
               matrix[i][j] = matrix[i-1][j]
           else:
               matrix[i][j] = matrix[i-1][j] + matrix[i-1][j-num_set[i-1]]

   #Retrieve elements of subsets    
   subsets = SubSets(matrix, num_set, num_sum)

   return matrix[len(num_set)][num_sum]

Based on Subset sum - Recover Solution, I used the following method to retrieve the subsets since the set will always be sorted:

def SubSets(matrix, num_set, num):

   #Initialize variables
   height = len(matrix)
   width = num
   subset_list = []
   s = matrix[0][num-1] #Keeps track of number until a change occurs

   for i in range(1, height):
       current = matrix[i][width]
       if current > s:
           s = current #keeps track of changing value
           cnt = i -1 #backwards counter, -1 to exclude current value already appended to list
           templist = []   #to store current subset
           templist.append(num_set[i-1]) #Adds current element to subset
           total = num - num_set[i-1] #Initial total will be sum - max element

           while cnt > 0:  #Loop backwards to find remaining elements
               if total >= num_set[cnt-1]: #Takes current element if it is less than total
                   templist.append(num_set[cnt-1])
                   total = total - num_set[cnt-1]
               cnt = cnt - 1

           templist.sort()
           subset_list.append(templist) #Add subset to solution set

   return subset_list

However, since it is a greedy approach it only works when the max element of each subset is distinct. If two subsets have the same max element then it only returns the one with the larger values. So for elements [1, 2, 3, 4, 5] with sum of 10 it only returns

[1, 2, 3, 4] , [1, 4, 5]

When it should return

[1, 2, 3, 4] , [2, 3, 5] , [1, 4, 5]

I could add another loop inside the while loop to leave out each element but that would increase the complexity to O(rows^3) which can potentially be more than the actual DP, O(rows*columns). Is there another way to retrieve the subsets without increasing the complexity? Or to keep track of the subsets while the DP approach is taking place? I created another method that can retrieve all of the unique elements in the solution subsets in O(rows):

def RecoverSet(matrix, num_set):
   height = len(matrix) - 1
   width = len(matrix[0]) - 1
   subsets = []

   while height > 0:
       current = matrix[height][width]
       top = matrix[height-1][width]

       if current > top:
           subsets.append(num_set[height-1])
       if top == 0:
           width = width - num_set[height-1]
       height -= 1

   return subsets

Which would output [1, 2, 3, 4, 5]. However, getting the actual subsets from it seems like solving the subset problem all over again. Any ideas/suggestions on how to store all of the solution subsets (not print them)?

adrien_vdb · Answer 1 · 2022-01-04T08:29:49.713

That's actually a very good question, but it seems mostly you got the right intuition.

The DP approach allows you to build a 2D table and essentially encode how many subsets sum up to the desired target sum, which takes time O(target_sum*len(num_set)).

Now if you want to actually recover all solutions, this is another story in the sense that the number of solution subsets might be very large, in fact much larger than the table you built while running the DP algorithm. If you want to find all solutions, you can use the table as a guide but it might take a long time to find all subsets. In fact, you can find them by going backwards through the recursion that defined your table (the if-else in your code when filling up the table). What do I mean by that?

Well let's say you try to find the solutions, having only the filled table at your disposal. The first thing to do to tell whether there is a solution is to check that the element at row len(num_set) and column num has value > 0, indicating that at least one subset sums up to num. Now there are two possibilities, either the last number in num_set is used in a solution in which case we must then check whether there is a subset using all numbers except that last one, which sums up to num-num_set[-1]. This is one possible branch in the recursion. The other one is when the last number in num_set is not used in a solution, in which case we must then check whether we can still find a solution to sum up to num, but having all numbers except that last one.

If you keep going you will see that the recovering can be done by doing the recursion backwards. By keeping track of the numbers along the way (so the different paths in the table that lead to the desired sum) you can retrieve all solutions, but again bear in mind that the running time might be extremely long because we want to actually find all solutions, not just know their existence.

This code should be what you are looking for recovering solutions given the filled matrix:

def recover_sol(matrix, set_numbers, target_sum):
    up_to_num = len(set_numbers)
    
    ### BASE CASES (BOTTOM OF RECURSION) ###

    # If the target_sum becomes negative or there is no solution in the matrix, then 
    # return an empty list and inform that this solution is not a successful one
    if target_sum < 0 or matrix[up_to_num][target_sum] == 0:
        return [], False

    # If bottom of recursion is reached, that is, target_sum is 0, just return an empty list
    # and inform that this is a successful solution
    if target_sum == 0:
        return [], True
    
    ### IF NOT BASE CASE, NEED TO RECURSE ###

    # Case 1: last number in set_numbers is not used in solution --> same target but one item less
    s1_sols, success1 = recover_sol(matrix, set_numbers[:-1], target_sum)

    # Case 2: last number in set_numbers is used in solution --> target is lowered by item up_to_num
    s2_sols, success2 = recover_sol(matrix, set_numbers[:-1], target_sum - set_numbers[up_to_num-1])

    # If Case 2 is a success but bottom of recursion was reached
    # so that it returned an empty list, just set current sol as the current item
    if s2_sols == [] and success2:
        # The set of solutions is just the list containing one item (so this explains the list in list)
        s2_sols = [[set_numbers[up_to_num-1]]]

    # Else there are already solutions and it is a success, go through the multiple solutions 
    # of  Case 2 and add the current number to them
    else:
        s2_sols = [[set_numbers[up_to_num-1]] + s2_subsol for s2_subsol in s2_sols]

    # Join lists of solutions for both Cases, and set success value to True 
    # if either case returns a successful solution
    return s1_sols + s2_sols, success1 or success2

For the full solution with matrix filling AND recovering of solutions you can then do

def subset_sum(set_numbers, target_sum):
    n_numbers = len(set_numbers)

    #Initialize DP matrix with base cases set to 1
    matrix = [[0 for i in range(0, target_sum+1)] for j in range(0, n_numbers+1)]
    for i in range(n_numbers+1): 
        matrix[i][0] = 1

    for i in range(1, n_numbers+1): #Iterate through set elements
        for j in range(1, target_sum+1):   #Iterate through sum
            if set_numbers[i-1] > j:    #When current element is greater than sum take the previous solution
                matrix[i][j] = matrix[i-1][j]
            else:
                matrix[i][j] = matrix[i-1][j] + matrix[i-1][j-set_numbers[i-1]]
 
   return recover_sol(matrix, set_numbers, target_sum)[0]

Cheers!

Recovering Subsets in Subset Sum Problem - Not All Subsets Appear

1 Answers1