You could split all three strings in lists:
list1 = list(str1)
and then walk list3
with the same algorithm you use now, checking whether list3[i]
is equal to list1[0]
or list2[0]
. If it was, you'd del
the item from the appropriate list.
Premature list end could then be caught as an exception.
The algorithm would be exactly the same, but implementation ought to be more performant.
UPDATE: turns out it actually isn't (about double the time). Oh well, might be useful to know.
And while benchmarking different scenarios, it turned out that unless it is specified that the three string lengths are "exact" (i.e., len(p1)+len(p2) == len(p3) ), then the most effective optimization is to check first thing. This immediately discards all cases where the two input strings can't match the third because of bad string lengths.
Then I encountered some cases where the same letter is in both strings, and assigning it to list1 or list2 might lead to one of the strings no longer matching. In those cases the algorithm fails with a false negative, which would require a recursion.
def isinter(str1,str2,str3,check=True):
# print "Checking %s %s and %s" % (str1, str2, str3)
p1,p2,p3 = 0,0,0
if check:
if len(str1)+len(str2) != len(str3):
return False
while p3 < len(str3):
if p1 < len(str1) and str3[p3] == str1[p1]:
if p2 < len(str2) and str3[p3] == str2[p2]:
# does str3[p3] belong to str1 or str2?
if True == isinter(str1[p1+1:], str2[p2:], str3[p3+1:], False):
return True
if True == isinter(str1[p1:], str2[p2+1:], str3[p3+1:], False):
return True
return False
p1 += 1
elif p2 < len(str2) and str3[p3] == str2[p2]:
p2 += 1
else:
return False
p3 += 1
return p1 == len(str1) and p2 == len(str2) and p3 == len(str3)
Then I ran some benchmarks on random strings, this the instrumentation (notice that it generates always valid shuffles, which may yield biased results):
for j in range(3, 50):
str1 = ''
str2 = ''
for k in range(1, j):
if random.choice([True, False]):
str1 += chr(random.randint(97, 122))
if random.choice([True, False]):
str2 += chr(random.randint(97, 122))
p1 = 0
p2 = 0
str3 = ''
while len(str3) < len(str1)+len(str2):
if p1 < len(str1) and random.choice([True, False]):
str3 += str1[p1]
p1 += 1
if p2 < len(str2) and random.choice([True, False]):
str3 += str2[p2]
p2 += 1
a = time.time()
for i in range(1000000):
isShuffle2(str1, str2, str3)
a = (time.time() - a)
b = time.time()
for i in range(1000000):
isinter(str1, str2, str3)
b = (time.time() - b)
print "(%s,%s = %s) in %f against %f us" % (str1, str2, str3, a, b)
The results seem to point to a superior efficiency of the cached+DP algorithm for short strings. When strings get longer (more than 3-4 characters), the cache+DP algorithm starts losing ground. At around length 10, the algorithm above performs twice as fast as the totally-recursive, cached version.
The DP algorithm performs better, but still worse than the above one, if strings contain repeated characters (I did this by restricting the range from a-z to a-i) and if the overlap is slight. For example in this case the DP loses by only 2us:
(cfccha,ddehhg = cfcchaddehhg) in 68.139601 against 66.826320 us
Not surprisingly, full overlap (one letter from each string in turn) sees the larger difference, with a ratio as high as 364:178 (a bit more than 2:1).