Ive got a large list of names (strings) that I have to check against each other to see if there are any typos.
To do this I've been using the pypi python-Levenshtein extension against the iterated list, with a typo being considered as a comparison with a Levenshtein distance of 1.
I am running into a problem with names such as 'cat 1' and 'cat 2' which are clearly ~different cats~ (not a typo), but are being flagged because their Levenshtein distance is 1.
I've tried putting a stop in before hand to check the string for any numbers , but as the list is quite long it doesn't do much for efficency.
Ideally, Im looking for a way to specify that if the only character changing is an int (ie. 'cat 1' vs 'cat 2'), then it is not considered a typo
Any suggestions for a different extension/method is welcomed, my greatest concern is efficiency, as mentioned - I have a big list