I have a bunch of company names to match, for example, I want to match this string: A&A PRECISION
with A&A PRECISION ENGINEERING
However, almost every similarity measure I use: like Hamming distance, Levenshtein distance, Restricted Damerau-Levenshtein distance, Full Damerau-Levenshtein distance, Longest Common Substring distance, Q-gram distance, cosine distance, Jaccard distance Jaro, and Jaro-Winkler distance
matches: B&B PRECISION
instead.
Any idea which metric would give more emphasis to the preciseness of the substrings and its sequence matched and care less about the length of the string? I think it is because of the length of the string that the metrics would always choose wrongly.