Here, I would like to match a given string match_text
to a longer string text
. I want to find match_text
's start location in text
, the closest one (you can assume that there is only one location). My current version of the code is to for
loop through a range of text
and calculate the Levenshtein distance. However, sometimes the text is really long (up to 90k characters). I'm not sure if there is a fast way to do the string search. Here is the current version of the snippet that I wrote:
import numpy as np
import Levenshtein as lev # pip install python-Levenshtein
def find_start_position(text, match_text):
lev_distances = []
for i in range(len(text) - len(match_text)):
match_len = len(match_text)
lev_distances.append(lev.distance(match_text, text[i: i + match_len]))
pos = np.argmin(lev_distances)
return pos
# example
find_start_position('I think this is really cool.', 'this iz')
>> 8
I would appreciate if someone knows or has a quick string search.