I'm looking for an algorithm that fits two strings, filling them up with spaces if necessary to minimize edit distance between them:
fit('algorithm', 'lgrthm') == ' lg r thm'
There sure must be some prewritten algorithm for this. Any ideas?
I'm looking for an algorithm that fits two strings, filling them up with spaces if necessary to minimize edit distance between them:
fit('algorithm', 'lgrthm') == ' lg r thm'
There sure must be some prewritten algorithm for this. Any ideas?
You could do something like the following:
def fit(target, source):
i, j = 0, 0
result = []
while i < len(source) and j < len(target):
if source[i] == target[j]:
result.append(source[i])
i += 1
else:
result.append(' ')
j += 1
return ''.join(result)
test = [('algorithm', 'lgrthm'), ('pineapple', 'pine'), ('pineapple', 'apple'), ('pineapple', 'eale'),
('foo', 'fo'), ('stack', 'sak'), ('over', 'or'), ('flow', 'lw')]
for t, s in test:
print(t)
print(fit(t, s))
print('---')
Output
algorithm
lg r thm
---
pineapple
pine
---
pineapple
apple
---
pineapple
ea le
---
foo
fo
---
stack
s a k
---
over
o r
---
flow
l w
---
A perhaps better version, is the following:
from collections import deque
def peak(q, default=' '):
"""Perform a safe peak, if the queue is empty return default"""
return q[0] if q else default
def fit(target, source):
ds = deque(source)
return ''.join([ds.popleft() if peak(ds) == e else ' ' for e in target])
Is better in the sense that you do not need to keep track of state variables i, j
like in the previous approach.
Took a naive, yet simple logic approach.
def fit(word1,word2):
A, B = list(word1), list(word2)
if len(B) < len(A):
B+= (len(A)-len(B)) * ['1']
else:
return ''.join(x if x in B else ' ' for x in A)
for i in range(len(B)):
if A[i] != B[i] :
B.insert(i,' ')
return ''.join(x for x in B if x != '1')
Test results:
algorithm lgrthm
lg r thm
---
pineapple pine
pine
---
pineapple apple
apple
---
pineapple eale
ea le
---
foo fo
fo
---
stack sak
s a k
---
over or
o r
---
flow lw
l w
---