Efficient computation of the overlapping between sequences

Question

Given two sequences s1 and s2, a supersequence of s1 and s2 is another sequences with length less than the sum of the lengths of s1 and s2 and that contains them. For example, for s1=[1,2,4,4], and s2=[4,4,9,7], a supersequence may be [1,2,4,4,9,7], and also [1,2,4,4,4,9,7].

I am trying to find an efficient implementation of a function f whose inputs are two sequences s1 and s2 and that does the following: first, compute the number of possible supersequences and then returns the position where the overlapping takes place (for simplicity, let's assume s1 always appears first in the supersequences).

For instance, taking the previous example, f([1,2,4,4], [4,4,9,7]) should return 2 and 3, the indexes where the second sequence starts in the two existing supersequences.

does s1 and s2 needs to be in order? Or can it be s2 before s1? — LBald, Jan 28 '21 at 23:37
This should be related to [LCS](https://en.wikipedia.org/wiki/Longest_common_substring_problem). It's called [SCS](https://en.wikipedia.org/wiki/Shortest_common_supersequence_problem). — wsdookadr, Jan 28 '21 at 23:40
You could have a look at [these other posts](https://stackoverflow.com/search?q=Shortest+common+supersequence). — wsdookadr, Jan 28 '21 at 23:46

Alain T. · Answer 1 · 2021-01-29T15:08:37.217

You can get the potential overlap positions with a list comprehension based on enumerate of the left side list.

idx = [ i for i,v1 in enumerate(s1) if v1==s2[0] ]

# [2, 3]

However, I would suggest a different overall strategy using a recursive generator to produce all the supersequences.

def superpose(s1,s2,inverted=False):
    if s1 and not inverted and s1[0] in s2:
        yield from superpose(s2,s1,True)
    if not s2: return
    if inverted and s2[0] not in s1:
        yield s1+s2;return
    for i,v1 in enumerate(s1):
        if v1 != s2[0]: continue
        yield from (s1[:i+1] + sp for sp in superpose(s1[i+1:],s2[1:],True))

output:

s1=[1,2,4,4]
s2=[4,4,9,7]

for sp in superpose(s1,s2): print(sp)

[1, 2, 4, 4, 9, 7]
[1, 2, 4, 4, 4, 9, 7]

for sp in superpose(s2,s1): print(sp) # insensitive to parameter order

[1, 2, 4, 4, 9, 7]
[1, 2, 4, 4, 4, 9, 7]

s1 = [1,2,3]
s2 = [2,4,1,6,2]
for sp in superpose(s1,s2): print(sp)

[1, 2, 3, 4, 1, 6, 2]
[2, 4, 1, 6, 2, 3]

If you need to find the shortest one, the generator can easily be fed to the min function:

min(superpose(s1,s2),key=len)    

[1, 2, 4, 4, 9, 7]

score 0 · Answer 2 · answered Jan 29 '21 at 00:11

I can't think of a simpler solution than this:

def f(s1, s2):
    indexes = []
    for i in range(len(s1)):
        seq_len = len(s1) - I
        if seq_len < len(s2) and s1[i:] == s2[:seq_len]:
            indexes.append(i)
    return indexes

And one-liner, if you like to live on the edge:

indexes = [i for i in range(len(s1)) if len(s1)-i>len(s2) and s1[i:]==s2[:len(s1)-i]]

Efficient computation of the overlapping between sequences

2 Answers2