How to fit strings using spaces, minimizing edit distance?

Question

I'm looking for an algorithm that fits two strings, filling them up with spaces if necessary to minimize edit distance between them:

fit('algorithm', 'lgrthm') == ' lg r thm'

There sure must be some prewritten algorithm for this. Any ideas?

I also want to point out the potential use of `difflib`'s `SequenceMatcher`. — Barney Szabolcs, Apr 03 '20 at 12:32

Dani Mesejo · Accepted Answer · 2020-04-01T22:36:25.210

You could do something like the following:

def fit(target, source):
    i, j = 0, 0
    result = []
    while i < len(source) and j < len(target):
        if source[i] == target[j]:
            result.append(source[i])
            i += 1
        else:
            result.append(' ')
        j += 1

    return ''.join(result)


test = [('algorithm', 'lgrthm'), ('pineapple', 'pine'), ('pineapple', 'apple'), ('pineapple', 'eale'),
        ('foo', 'fo'), ('stack', 'sak'), ('over', 'or'), ('flow', 'lw')]

for t, s in test:
    print(t)
    print(fit(t, s))
    print('---')

Output

algorithm
 lg r thm
---
pineapple
pine
---
pineapple
    apple
---
pineapple
   ea  le
---
foo
fo
---
stack
s a k
---
over
o  r
---
flow
 l w
---

A perhaps better version, is the following:

from collections import deque


def peak(q, default=' '):
    """Perform a safe peak, if the queue is empty return default"""
    return q[0] if q else default


def fit(target, source):
    ds = deque(source)
    return ''.join([ds.popleft() if peak(ds) == e else ' ' for e in target])

Is better in the sense that you do not need to keep track of state variables i, j like in the previous approach.

andreis11 · Answer 2 · 2020-04-02T03:19:14.177

0

Took a naive, yet simple logic approach.

def fit(word1,word2):

  A, B = list(word1), list(word2)

  if len(B) < len(A):
    B+= (len(A)-len(B)) * ['1']
  else:
    return ''.join(x if x in B else ' ' for x in A)

  for i in range(len(B)):
    if A[i] != B[i] : 
      B.insert(i,' ')
  return ''.join(x for x in B if x != '1')

Test results:

algorithm lgrthm
 lg r thm
---
pineapple pine
pine     
---
pineapple apple
    apple
---
pineapple eale
   ea  le
---
foo fo
fo 
---
stack sak
s a k
---
over or
o  r
---
flow lw
 l w
---

edited Apr 02 '20 at 03:19

answered Apr 01 '20 at 21:54

andreis11

1,133
1
6
10

Have you tried it with any other input? What happens when you put fit('pine','pineapple')? – C. Fennell Apr 01 '20 at 22:35
You could easily remove the length cases and get what you want ( e.g. it('pine','pineapple' == ' pine ') Added second scenario where I removed the word length. – andreis11 Apr 01 '20 at 22:43
What about when you do something like "iapple" and "pineapple"? – C. Fennell Apr 01 '20 at 22:47

How to fit strings using spaces, minimizing edit distance?

2 Answers2