0

I have to implement a fuzzy matching solution for a client and am going to use Damerau-Levenshtein for that. So far so good, but I'm concerned about cascades/collapse/chains, or however you would like to call it where A matches B, and B matches C, but A doesn't match C, and C might match something else in turn, etc... In theory all records could collapse onto one record. But even if that doesn't happen, what is the industry standard of handling this problem? All the sources I've read seem to conveniently ignore this problem, but to me that seems the actual hard part of fuzzy matching, not the trivial choice of the edit-distance.

Is the industry standard to just ignore this? Or to allow the cascade and just control the order in which it happens, or is the answer "it depends on what the client wants"?

This isn't even exclusive to fuzzy matching, anytime you have an "OR" statement in your match criteria you would run into the same problem, but I never see it addressed anywhere.

I played around with different solutions I thought of myself, but I'm not sure there's a definitive correct answer.

JanB
  • 1

0 Answers0