I have a list of words that were inputted by my users after I did some cleaning up (to correct spelling mistakes) I have the following list, each row represents a string and the number of times this string was inputted:
Pepsi 500
Coke 358
Dr. pepper 254
Sprite 204
Coca cola 159
7 up 140
Mountain dew 137
Diet coke 58
Mtn. dew 50
Now I would like to have a script that will go over this list and group similar words. For example, merging Coke, Coca cola and Diet coke into one group (because they are synonyms of Coca cola).
I saw that in NLTK WordNet there are some similarity functions, can I use them? or is there a "better" way of approaching this problem?