Dynamic Programming for shortest subsequence that is not a subsequence of two strings

Question

Problem: Given two sequences s1 and s2 of '0' and '1'return the shortest sequence that is a subsequence of neither of the two sequences.

E.g. s1 = '011' s2 = '1101' Return s_out = '00' as one possible result.

Note that substring and subsequence are different where substring the characters are contiguous but in a subsequence that needs not be the case.

My question: How is dynamic programming applied in the "Solution Provided" below and what is its time complexity?

My attempt involves computing all the subsequences for each string giving sub1 and sub2. Append a '1' or a '0' to each sub1 and determine if that new subsequence is not present in sub2.Find the minimum length one. Here is my code:

My Solution

def get_subsequences(seq, index, subs, result): 
    if index == len(seq): 
        if subs: 
            result.add(''.join(subs))
    else:
        get_subsequences(seq, index + 1, subs, result)
        get_subsequences(seq, index + 1, subs + [seq[index]], result)

def get_bad_subseq(subseq):
    min_sub = ''
    length = float('inf')
    for sub in subseq:
        for char in ['0', '1']:
            if len(sub) + 1 < length and sub + char not in subseq:
                length = len(sub) + 1
                min_sub = sub + char
    return min_sub

Solution Provided (not mine)

How does it work and its time complexity?

It looks that the below solution looks similar to: http://kyopro.hateblo.jp/entry/2018/12/11/100507

def set_nxt(s, nxt):
    n = len(s)
    idx_0 = n + 1
    idx_1 = n + 1
    for i in range(n, 0, -1):
        nxt[i][0] = idx_0
        nxt[i][1] = idx_1
        if s[i-1] == '0':
            idx_0 = i
        else:
            idx_1 = i
    nxt[0][0] = idx_0
    nxt[0][1] = idx_1

def get_shortest(seq1, seq2):
    len_seq1 = len(seq1)
    len_seq2 = len(seq2)
    nxt_seq1 = [[len_seq1 + 1 for _ in range(2)] for _ in range(len_seq1 + 2)] 
    nxt_seq2 = [[len_seq2 + 1 for _ in range(2)] for _ in range(len_seq2 + 2)] 

    set_nxt(seq1, nxt_seq1)
    set_nxt(seq2, nxt_seq2)

    INF = 2 * max(len_seq1, len_seq2)
    dp = [[INF for _ in range(len_seq2 + 2)] for _ in range(len_seq1 + 2)]
    dp[len_seq1 + 1][len_seq2 + 1] = 0
    for i in range( len_seq1 + 1, -1, -1):
        for j in range(len_seq2 + 1, -1, -1):
            for k in range(2):
                if dp[nxt_seq1[i][k]][nxt_seq2[j][k]] < INF:
                    dp[i][j] = min(dp[i][j], dp[nxt_seq1[i][k]][nxt_seq2[j][k]] + 1);

    res = ""
    i = 0
    j = 0
    while i <= len_seq1 or j <= len_seq2:
        for k in range(2):
            if (dp[i][j] == dp[nxt_seq1[i][k]][nxt_seq2[j][k]] + 1):
                i = nxt_seq1[i][k]
                j = nxt_seq2[j][k]
                res += str(k)
                break;
    return res

btilly · Accepted Answer · 2021-10-25T01:42:12.200

I am not going to work it through in detail, but the idea of this solution is to create a 2-D array of every combinations of positions in the one array and the other. It then populates this array with information about the shortest sequences that it finds that force you that far.

Just constructing that array takes space (and therefore time) O(len(seq1) * len(seq2)). Filling it in takes a similar time.

This is done with lots of bit twiddling that I don't want to track.

I have another approach that is clearer to me that usually takes less space and less time, but in the worst case could be as bad. But I have not coded it up.

UPDATE:

Here is is all coded up. With poor choices of variable names. Sorry about that.

# A trivial data class to hold a linked list for the candidate subsequences
# along with information about they match in the two sequences.
import collections
SubSeqLinkedList = collections.namedtuple('SubSeqLinkedList', 'value pos1 pos2 tail')

# This finds the position after the first match.  No match is treated as off the end of seq.
def find_position_after_first_match (seq, start, value):
    while start < len(seq) and seq[start] != value:
        start += 1
    return start+1

def make_longer_subsequence (subseq, value, seq1, seq2):
    pos1 = find_position_after_first_match(seq1, subseq.pos1, value)
    pos2 = find_position_after_first_match(seq2, subseq.pos2, value)
    gotcha = SubSeqLinkedList(value=value, pos1=pos1, pos2=pos2, tail=subseq)
    return gotcha

def minimal_nonsubseq (seq1, seq2):
    # We start with one candidate for how to start the subsequence
    # Namely an empty subsequence.  Length 0, matches before the first character.
    candidates = [SubSeqLinkedList(value=None, pos1=0, pos2=0, tail=None)]

    # Now we try to replace candidates with longer maximal ones - nothing of
    # the same length is better at going farther in both sequences.
    # We keep this list ordered by descending how far it goes in sequence1.
    while candidates[0].pos1 <= len(seq1) or candidates[0].pos2 <= len(seq2):
        new_candidates = []
        for candidate in candidates:
            candidate1 = make_longer_subsequence(candidate, '0', seq1, seq2)
            candidate2 = make_longer_subsequence(candidate, '1', seq1, seq2)
            if candidate1.pos1 < candidate2.pos1:
                # swap them.
                candidate1, candidate2 = candidate2, candidate1
            for c in (candidate1, candidate2):
                if 0 == len(new_candidates):
                    new_candidates.append(c)
                elif new_candidates[-1].pos1 <= c.pos1 and new_candidates[-1].pos2 <= c.pos2:
                    # We have found strictly better.
                    new_candidates[-1] = c
                elif new_candidates[-1].pos2 < c.pos2:
                    # Note, by construction we cannot be shorter in pos1.
                    new_candidates.append(c)
        # And now we throw away the ones we don't want.
        # Those that are on their way to a solution will be captured in the linked list.
        candidates = new_candidates

    answer = candidates[0]
    r_seq = [] # This winds up reversed.
    while answer.value is not None:
        r_seq.append(answer.value)
        answer = answer.tail

    return ''.join(reversed(r_seq))


print(minimal_nonsubseq('011', '1101'))

"combinations of positions in one array and the other" - do you mean relative to a character in s1 where is the next dissimilar character in s2. What is the idea behind your clearer approach? — jess, Apr 04 '20 at 06:02
@jess I added an implementation. Basically keep a "fringe" of the subsequences that go farthest out among the subsequences of the current length as a linked list. Eventually you find the shortest one that goes off the end. The tradeoff is a more complex data structure but you only record a node of a good subsequence once. — btilly, Apr 06 '20 at 03:42
what is the time complexity? Is it 2^(len(seq1) + len(seq2)). Is the space complexity: O(max(len(seq1), len(seq2)))? — jess, Apr 08 '20 at 20:34
@jess The worst case is `O(len(seq1) * len(seq2))` both time and space. — btilly, Apr 08 '20 at 21:17
The `if` condition ```if candidate1.pos1 < candidate2.pos1: # swap them. candidate1, candidate2 = candidate2, candidate1``` is repeated twice. — Dude901, Oct 24 '21 at 08:39

Dynamic Programming for shortest subsequence that is not a subsequence of two strings

1 Answers1