Efficient data structure for searching a dictionary of words in python using difflib?

Question

I am trying to write a spellchecker and I wanted to use difflib to implement it. Basically I have a list of technical terms that I added to the standard unix dictionary (/usr/share/dict/words) that I'm storing in a file I call dictionaryFile.py.

I have another script just called stringSim.py where I import the dictionary and test sample strings against it. Here is the basic code:

import os, sys
import difflib
import time
from dictionaryFile import wordList

inputString = "dictiunary" 
print "Search query: "+inputString  
startTime = time.time()

inputStringSplit = inputString.split()
for term in inputStringSplit:
termL = term.lower()
print "Search term: "+term
closeMatches = difflib.get_close_matches(termL,wordList)
if closeMatches[0] == termL: 
    print "Perfect Match"
else:
    print "Possible Matches"
    print "\n".join(closeMatches)

print time.time() - startTime, "seconds"

It returns the following:

$ python stringSim.py 
Search query: dictiunary
Search term: dictiunary
Possible Matches
dictionary
dictionary's
discretionary
0.492614984512 seconds

I'm wondering if there are better strategies I could be using for looking up similar matches (assuming a word is misspelled). This is for a web application so I am trying to optimize this part of the code to be a little snappier. Is there a better way I could structure the wordList variable (right now it is just a list of words)?

Thanks.

Since you have working code and are looking for improvements, you may have better results on Code Review — wnnmaw, May 16 '14 at 17:58

score 0 · Answer 1 · answered May 16 '14 at 19:49

0

I'm not sure difflib is best solution for this kind of work; typically, spellcheckers use some sort of edit distance, e.g. Levenshtein distance. NLTK includes implementation(s) of edit distance, I'd start there instead.

answered May 16 '14 at 19:49

LetMeSOThat4U

6,470
10
53
93

Efficient data structure for searching a dictionary of words in python using difflib?

1 Answers1