2

So I tried asking this before, but I guess I wasn't really clear enough with what I was looking for. I'm making an optimal string alignment algorithm, it's really just a dynamic programming problem. So I decided to write it recursively. The program consists of two parts:

  • Finding the "edit distance" between two words, where you minimize alignments based on some associated cost. For example, aligning the same letter costs 0, aligning two vowels costs 0.5, but aligning a letter with a gap costs 1.
  • Visualizing the alignment: i.e. putting the strings on top of each other with the optimal alignment of characters and gaps.

I think my edit distance works. I'm getting the same values as my peers and there seem to be no immediate problems. However, I'm having a hard time figuring out how to recover the sequence of matchings, insertions, and deletions to visualize the alignment. My problem comes from the fact that I have a recursive function that takes the minimum of three recursive calls. So, I end up with a longer sequence than necessary because each recursive call appends a "move" (match, insertion, deletion), that may not be used because it is not the least costly one.

Here is my code:

newseq = []
@memoize
def opt(a, b):
    global newseq # Visual Alignment 'move' sequence
    gap = 1 # Gap cost
    if not a: 
        return len(b)
    if not b: 
        return len(a)

    if a and b:
        p1 = a[0] in v   # First letters vowells?
        p2 = b[0] in v   
        if a[0] == b[0]: # First letters equal each other?
            alpha = 0
        elif p1 ^ p2:    # Vowel and Consonant?
            alpha = 1.2
        elif p1 and p2:  # Vowel and Vowel?
            alpha = 0.5
        else:            # Consonant and Consonant?
            alpha = 0.6

    r1 = opt(a[1:], b[1:]) + alpha
    r2 = opt(a[1:], b) + gap
    r3 = opt(a, b[1:]) + gap
    # Reset newseq
    newseq = newseq[:-3]

    # Takes min of recursive calls, and gives associated 'move'
    res = min((r1, 'Match'),      # Match
              (r2, 'Insertion'),  # Insertion
              (r3, 'Deletion'),   # Deletion
              key = lambda x: x[0])

    newseq.append(res[1])
    return res[0]

So yea, I know that what I have is not working. My global newseq variable currently has a length of one because I try to reset it by removing all the appends that happen during the recursive calls. How do I set up a way to record the sequence of 'moves' that make up the optimal alignment with this recursive algorithm?

Edit: Here is my memoize decorator function:

def memoize(f):
    cache = {}
    def decorated_function(*args):
        if args in cache:
            return cache[args]
        else:
            cache[args] = f(*args)
            return cache[args]
    return decorated_function
atb
  • 943
  • 4
  • 14
  • 30

1 Answers1

1

1) Pass as an argument through your recursive function a Stack (or other collection).

2) When you call yourself recursively, also push onto the stack what step you are taking (e.g. using an enumeration of step types and ints/chars/strings/whatever indicating what it is doing).

3) When you return from the call in 2), pop the stack and repeat 2).

4) When you have a solution, you can store the stack associated with its result/score.

Patashu
  • 21,443
  • 3
  • 45
  • 53
  • okay cool! This is what I was looking for. I actually tried something like this, but because of my decorator function memoize, I get an error "TypeError: unhashable type list" when I pass the alignment_sequence into the function. Do you know how I fix or avoid this? I've editing in my memoize function if that helps – atb Mar 03 '13 at 23:48
  • @alexthebake List needs to be hashable, or else you can't put lists in a dictionary. Think about it - how is it going to make a key if it can't hash it? – Patashu Mar 03 '13 at 23:54
  • That makes sense. What should I do then, make the list into a tuple? Can tuples still act as stacks? – atb Mar 03 '13 at 23:59
  • @alexthebake I just remembered that (I think) python only accepts immutable objects for hashing. So you'd need a tuple, yeah, or something else that's frozen and can be hashed. – Patashu Mar 04 '13 at 00:26