3

I'm looking for an algorithm that fits two strings, filling them up with spaces if necessary to minimize edit distance between them:

fit('algorithm', 'lgrthm') == ' lg r thm'

There sure must be some prewritten algorithm for this. Any ideas?

Barney Szabolcs
  • 11,846
  • 12
  • 66
  • 91

2 Answers2

3

You could do something like the following:

def fit(target, source):
    i, j = 0, 0
    result = []
    while i < len(source) and j < len(target):
        if source[i] == target[j]:
            result.append(source[i])
            i += 1
        else:
            result.append(' ')
        j += 1

    return ''.join(result)


test = [('algorithm', 'lgrthm'), ('pineapple', 'pine'), ('pineapple', 'apple'), ('pineapple', 'eale'),
        ('foo', 'fo'), ('stack', 'sak'), ('over', 'or'), ('flow', 'lw')]

for t, s in test:
    print(t)
    print(fit(t, s))
    print('---')

Output

algorithm
 lg r thm
---
pineapple
pine
---
pineapple
    apple
---
pineapple
   ea  le
---
foo
fo
---
stack
s a k
---
over
o  r
---
flow
 l w
---

A perhaps better version, is the following:

from collections import deque


def peak(q, default=' '):
    """Perform a safe peak, if the queue is empty return default"""
    return q[0] if q else default


def fit(target, source):
    ds = deque(source)
    return ''.join([ds.popleft() if peak(ds) == e else ' ' for e in target])

Is better in the sense that you do not need to keep track of state variables i, j like in the previous approach.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
0

Took a naive, yet simple logic approach.

def fit(word1,word2):

  A, B = list(word1), list(word2)

  if len(B) < len(A):
    B+= (len(A)-len(B)) * ['1']
  else:
    return ''.join(x if x in B else ' ' for x in A)

  for i in range(len(B)):
    if A[i] != B[i] : 
      B.insert(i,' ')
  return ''.join(x for x in B if x != '1')

Test results:

algorithm lgrthm
 lg r thm
---
pineapple pine
pine     
---
pineapple apple
    apple
---
pineapple eale
   ea  le
---
foo fo
fo 
---
stack sak
s a k
---
over or
o  r
---
flow lw
 l w
---
andreis11
  • 1,133
  • 1
  • 6
  • 10