Subset sum algorithm a little faster than 2^(n/2) in worst time?

Question

After analyzing the fastest subset sum algorithm which runs in 2^(n/2) time, I noticed a slight optimization that can be done. I'm not sure if it really counts as an optimization and if it does, I'm wondering if it can be improved by recursion.

Basically from the original algorithm: http://en.wikipedia.org/wiki/Subset_sum_problem (see part with title Exponential time algorithm)

it takes the list and splits it into two
then it generates the sorted power sets of both in 2^(n/2) time
then it does a linear search in both lists to see if 1 value in both lists sum to x using a clever trick

In my version with the optimization

it takes the list and removes the last element last
then it splits the list in two
then it generates the sorted power sets of both in 2^((n-1)/2) time
then it does a linear search in both lists to see if 1 value in both lists sum to x or x-last (at same time with same running time) using a clever trick

If it finds either, then I will know it worked. I tried using python time functions to test with lists of size 22, and my version is coming like twice as fast apparently.

After running the below code, it shows

0.050999879837   <- the original algorithm
0.0250000953674   <- my algorithm

My logic for the recursion part is, well if it works for a size n list in 2^((n-1)/1) time, can we not repeat this again and again?

Does any of this make sense, or am I totally wrong?

Thanks

I created this python code:

from math import log, ceil, floor
import helper  # my own code
from random import randint, uniform
import time

# gets a list of unique random floats
# s = how many random numbers
# l = smallest float can be
# h = biggest float can be
def getRandomList(s, l, h):    
    lst = []
    while len(lst) != s:
        r = uniform(l,h)
        if not r in lst:
            lst.append(r)
    return lst

# This just generates the two powerset sorted lists that the 2^(n/2) algorithm makes.
# This is just a lazy way of doing it, this running time is way worse, but since
# this can be done in 2^(n/2) time, I just pretend its that running time lol
def getSortedPowerSets(lst):
    n = len(lst)
    l1 = lst[:n/2]
    l2 = lst[n/2:]

    xs = range(2**(n/2))
    ys1 = helper.getNums(l1, xs)
    ys2 = helper.getNums(l2, xs)

    return ys1, ys2

# this just checks using the regular 2^(n/2) algorithm to see if two values
# sum to the specified value
def checkListRegular(lst, x):
    lst1, lst2 = getSortedPowerSets(lst)

    left = 0
    right = len(lst2)-1
    while left < len(lst1) and right >= 0:
        sum = lst1[left] + lst2[right]
        if sum < x:
            left += 1
        elif sum > x:
            right -= 1
        else:
            return True
    return False

# this is my improved version of the above version
def checkListSmaller(lst, x):
    last = lst.pop()
    x1, x2 = x, x - last
    return checkhelper(lst, x1, x2)

# this is the same as the function 'checkListRegular', but it checks 2 values
# at the same time
def checkhelper(lst, x1, x2):    
    lst1, lst2 = getSortedPowerSets(lst)    

    left = [0,0]
    right = [len(lst2)-1, len(lst2)-1]

    while 1:
        check = 0
        if left[0] < len(lst1) and right[0] >= 0:
            check += 1
            sum = lst1[left[0]] + lst2[right[0]]
            if sum < x1:
                left[0] += 1
            elif sum > x1:
                right[0] -= 1
            else:
                return True
        if left[1] < len(lst1) and right[1] >= 0:
            check += 1
            sum = lst1[left[1]] + lst2[right[1]]
            if sum < x2:
                left[1] += 1
            elif sum > x2:
                right[1] -= 1
            else:
                return True
        if check == 0:
            return False


n = 22
lst = getRandomList(n, 1, 3000)


startTime = time.time()
print checkListRegular(lst, -50)  # -50 so it does worst case scenario
startTime2 = time.time()
print checkListSmaller(lst, -50)  # -50 so it does worst case scenario
startTime3 = time.time()

print (startTime2 - startTime)
print (startTime3 - startTime2)

This is the helper library which I just use to generate the powerset list.

def dec_to_bin(x):
    return int(bin(x)[2:])

def getNums(lst, xs):
    sums = []
    n = len(lst)
    for i in xs:
        bin = str(dec_to_bin(i))
        bin = (n-len(bin))*"0" + bin
        chosen_items = getList(bin, lst)
        sums.append(sum(chosen_items))
    sums.sort()
    return sums

def getList(binary, lst):
    s = []
    for i in range(len(binary)):
        if binary[i]=="1":
            s.append(float(lst[i]))
    return s

without reading the wiki or your code 2^((n-1)/2) = 2^(n/2)/2^(1/2) ie same asymptotic performance bound — user3125280, Aug 09 '14 at 03:44

score 3 · Accepted Answer · answered Aug 09 '14 at 03:49

3

then it generates the sorted power sets of both in 2^((n-1)/2) time

OK, since now the list has one less lement. However, this is not a big deal its just a constant time improvement of 2^(1/2)...

then it does a linear search in both lists to see if 1 value in both lists sum to x or x-last (at same time with same running time) using a clever trick

... and this improvement will go away because now you do twice as many operations to check for both x and x-last sums instead of only for x

can we not repeat this again and again?

No you can't, for the same reason why you couldn't split the original algorithm again and again. The trick only works for once because once you start looking for values in more than two lists you can't use the sorting trick anymore.

answered Aug 09 '14 at 03:49

hugomg

68,213
24
160
246

But then what about the time values showing my version is faster? Since you claim the second part negates the improvement in the first part, shouldn't the time in the end be the same as the original algorithm time? – omega Aug 09 '14 at 04:02
@omega if you're talking about big-O time, you need a series of times over various `n` to say it's better than `2^(n/2)`. A single timing result may just be a constant time improvement. – TheSoundDefense Aug 09 '14 at 04:13
2

@omega both the linear search and subset sum generation are O(2N/2) - everytime you take one off the end, you reduce the subset sum part by 1/2^(1/2) and double the number of linear searches (increase by 2 times) therefore increasing the theoretical total time by around 2^(1/2) - the real world speed improvement is likely because the subset sum is faster than the linear search *by a constant factor only* – user3125280 Aug 09 '14 at 04:16

Subset sum algorithm a little faster than 2^(n/2) in worst time?

1 Answers1