I want to match a string with another string from OCR(Optical Character Recognition).
Usually, OCR-read text are imperfect. In my case, 5's are misrecognized as S and so on.
So I am wondering if there's a way to calucate a edit-distance with custom distance.
For example, if I want to calculate a distance from 5S00AS
to SSOODS
,
I would want to make a substitution distance for A
to D
large and 5
to S
small so that distance('5S00AS', 'SSOOAS')
is much smaller than distance('5S00AS', 'PDDDDA')
.
I think soundex is in the similar vein except that similar sounds have smaller distance. We should have smaller distance for simliar looking spellings.
I wonder if there is already a function or package for doing this type of distance calculation.