0

If I have two columns both are retrieved from different resources but with the same Identifier and I need to check if they are similar but there might be only differences in the spelling or the are completely different.

2 Answers2

1

If you want to check whether the two sentences are similar except for spelling differences then, you can use the Normalized Levenshtein Distance or the string edit distance.

s1= "Quick brown fox"
s2= "Quiqk drown fox"

The Levenshtein distance between the two sentences is two.

If you want to check for semantic differences, then you will have to probably use machine learning based model. Simplest thing you can do for semantic similarity is use a model like Sentence2Vec or Doc2Vec and get semantic embeddings for two sentences and compute their dot product.

0

As shubh gupta noted above me, there are measures of distance among strings. They usually return a magnitude related to the difference of characters or substrings. Tge Levenshtein Distance is one of the most common one. You can find a really cool articule that explains how it works here.

Looking on how your question is stated, I do not think you're looking for the semantic difference between your two input strings, you would need an NLP model to do that. Maybe you can restate your question and provide more information on exactly the difference that you want to measure.

ignacioct
  • 325
  • 1
  • 12