I am working on a script able to make an approximate match of a certain pattern in a string, reporting just the positions in which these patterns (they could be overlapping) initiate.
So far, I obtained a script able to report the positions of the exact match, but with no success for approximate ones:
import re
stn = 'KLHLHLHKPLHLHLPHHKLHKLPKPH'
pat = 'KLH'
matches = re.finditer(r'(?=(%s))' % re.escape(pat), stn)
finalmatch= [m.start() for m in matches]
pos = ' '.join(str(v) for v in finalmatch)
print pos
the result in this case is: 0 17 but what if the script report also approximate matches? i.e. if the maximum permitted error (tolerance or threshold) is 1 (in any position of the query pattern), how can the initial positions of HLH, PLH, KLP, KPH be reported?
I already tried to include distance measure like Levenshtein or SequenceMatcher, but with no success.
Thanks in advance for your help.