I have a data set that is sorted by company names. Sometimes the names are misspelled and show as unique entries:
Name
ABC Company
ABc Company
DEF Company
def compANY
Ddf Cmpany
abC comPany
In fact, these entries are variations of the same two company names. This is clearly a problem with my initial data set but I need to take care of it to process my data correctly.
Name
ABC Company
DEF Company
I don't know how I can approach this, other than long loops that test modified versions of the words against a dictionary-like data structure. Is there a library for spellchecking (and would that even make sense for company names)?
I'd appreciate any help and don't have a preference for any package. Thank you.