5

I am writing a spell checker using nltk and wordnet, I have a few wrongly spelt words say "belive". What I want to do is find all words from wordnet that are separated by a leveshtein's edit distance of 1 or 2 from this given word. Does nltk provide any methods to accomplish this? How to do this?


May be, I put it wrongly. the edit_distance method takes 2 arguments like edit_distance(word1,word2) returns the levenshtein's distance between word1 and word2. What I want is to find edit distance between the word I give with every other word in wordnet.

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
Nihar Sarangi
  • 4,845
  • 8
  • 27
  • 32
  • 1
    Are you sure Wordnet is what you want here? Seems like overkill. Enchant may be better: http://packages.python.org/pyenchant/ – Jesse Aldridge Sep 21 '11 at 21:12

2 Answers2

1

It does in fact provide an edit_distance method. See the docs here

brc
  • 5,281
  • 2
  • 29
  • 30
0

Okay, finally came up with a solution:

from nltk.corpus import wordnet
f=open("wordnet_wordlist.txt","w")
for syn in list(wordnet.all_synsets()):
    f.write(syn.name[:-5])
    f.write("\n")

f.close()

f = open("wordnet_wordlist.txt")
f2 = open("wordnet_wordlist_final.txt", "w")
uniquelines = set(f.read().split("\n"))
f2.write("".join([line + "\n" for line in uniquelines]))
f2.close()

Now reading from the final wordlist_final file and using nltk.edit_distance the list can be found

wordnetobj=open("wordnet_wordlist_final.txt","r")
wordnet=wordnetobj.readlines()
def edit(word,distance):
    validlist=[]
    for valid in wordnet:
        valids=valid[:-1]
        if(abs(len(valids)-len(word))<=2):
            if(nltk.edit_distance(word,valids)==distance):
                validlist.append(valids)

    return validlist 
Nihar Sarangi
  • 4,845
  • 8
  • 27
  • 32