0

I am having a bit of trouble with some Python code. I have a large text file called "big.txt". I have iterated over it in my code to sort each word into an array (or list) and then iterated over it again to remove any character that is not in the alphabet. I also have a function called worddistance which looks at how similar two words are and returns a score subsequently. I have another function called autocorrect. I want to pass this function a misspelled word, and print a 'Did you mean...' sentence with words that gave a low score on the worddistance function (the function adds 1 to a counter whenever a difference is noticed - the lower the score, the more similar).
Strangely, I keep getting the error:

"Index Error: string index out of range"

I am at a loss at what is going on!

My code is below.

Thanks in advance for the replies,
Samuel Naughton

f = open("big.txt", "r")

words = list()

temp_words = list()
for line in f:
    for word in line.split():
        temp_words.append(word.lower())

allowed_characters = 'abcdefghijklmnopqrstuvwxyz'       
for item in temp_words:
    temp_new_word = ''
    for char in item:
        if char in allowed_characters:
            temp_new_word += char
        else:
            continue
    words.append(temp_new_word)
list(set(words)).sort()

def worddistance(word1, word2):
    counter = 0
    if len(word1) > len(word2):
        counter += len(word1) - len(word2)
        new_word1 = word1[:len(word2) + 1] 
        for char in range(0, len(word2) + 1) :
            if word2[char] != new_word1[char]:
                counter += 1
            else:
                continue
    elif len(word2) > len(word1):
        counter += len(word2) - len(word1)
        new_word2 = word2[:len(word1) + 1]
        for char in range(0, len(word1) + 1):
            if word1[char] != word2[char]:
                counter += 1
            else:
                continue
    return counter

def autocorrect(word):
    word.lower()
    if word in words:
        print("The spelling is correct.")
        return
    else:
        suggestions = list()
        for item in words:
            diff = worddistance(word, item)
            if diff == 1:
                suggestions.append(item)
       print("Did you mean: ", end = ' ')

    if len(suggestions) == 1:
                print(suggestions[0])
                return

    else:
        for i in range(0, len(suggestions)):
            if i == len(suggestons) - 1:
                print("or " + suggestions[i] + "?")
                return
            print(suggestions[i] + ", ", end="")
            return
Srivatsan
  • 9,225
  • 13
  • 58
  • 83

2 Answers2

0

In worddistance(), it looks like for char in range(0, len(word1) + 1): should be:

for char in range(len(word1)):

And for char in range(0, len(word2) + 1) : should be:

for char in range(len(word2)):

And by the way, list(set(words)).sort() is sorting a temporary list, which is probably not what you want. It should be:

words = sorted(set(words))
irrelephant
  • 4,091
  • 2
  • 25
  • 41
0

As mentioned in the other comment, you should range(len(word1)).

In addition to that: - You should consider case where word1 and words have the same length #len(word2) == len(word1) - You should also take care of naming. In the second condition in wordDistance function

 if word1[char] != word2[char]:

You should be comparing to new_word2

if word1[char] != new_word2[char]:

- In the autocorrect, you should assign lower to word= word.lower()

words= [] 
for item in temp_words:
    temp_new_word = ''
    for char in item:
        if char in allowed_characters:
            temp_new_word += char
        else:
            continue
    words.append(temp_new_word)
words= sorted(set(words))

def worddistance(word1, word2):
    counter = 0
    if len(word1) > len(word2):
        counter += len(word1) - len(word2)
        new_word1 = word1[:len(word2) + 1] 
        for char in range(len(word2)) :
            if word2[char] != new_word1[char]:
                counter += 1
    elif len(word2) > len(word1):
        counter += len(word2) - len(word1)
        new_word2 = word2[:len(word1) + 1]
        for char in range(len(word1)):
            if word1[char] != new_word2[char]:  #This is a problem
                counter += 1
    else:  #len(word2) == len(word1)      #You missed this case
        for char in range(len(word1)):
            if word1[char] != word2[char]:  
                counter += 1
    return counter

def autocorrect(word):
    word= word.lower() #This is a problem
    if word in words:
        print("The spelling is correct.")
    else:
        suggestions = list()
        for item in words:
            diff = worddistance(word, item)
            print diff
            if diff == 1:
                suggestions.append(item)
        print("Did you mean: ")

        if len(suggestions) == 1:
            print(suggestions[0])

        else:
            for i in range(len(suggestions)):
                if i == len(suggestons) - 1:
                    print("or " + suggestions[i] + "?")
                print(suggestions[i] + ", ")

Next time, Try to use Python built-in function like enumerate, to avoid using for i in range(list), then list[i], len instead of counter .. etc

Eg: Your distance function could be written this way, or much more simpler.

def distance(word1, word2):
    counter= max(len(word1),len(word2))- min(len(word1),len(word2))
    if len(word1) > len(word2):
        counter+= len([x for x,z in zip (list(word2), list(word1[:len(word2) + 1])) if x!=z])
    elif len(word2) > len(word1):
        counter+= len([x for x,z in zip (list(word1), list(word2[:len(word1) + 1])) if x!=z])
    else:
        counter+= len([x for x,z in zip (list(word1), list(word2)) if x!=z])
    return counter
Community
  • 1
  • 1