-1

This is the code I currently have:

from collections import defaultdict

goodwords = set()

with open("soccer.txt", "rt") as f:
     for word in f.readlines():
        goodwords.add(word.strip())

badwords = defaultdict(list)

with open("soccer.txt", "rt") as f:
    for line_no, line in enumerate(f):
        for word in line.split():
            if word not in text:
                badwords[word].append(line_no)

print(badwords)

How can I fix my code so that it prints the incorrect words stored inside the words list and the line number?

For example if the word togeher was misspelled on lines 5 and 7, it would print something like:

togeher 5 7
Michael0x2a
  • 58,192
  • 30
  • 175
  • 224
jad
  • 17
  • 4

2 Answers2

1

When you insert the new counter into d, you check first word is contained in words. Probably you wanted to check if word is already contained in d:

if word not in d:
    d[word] = [counter]
else:
    d[word].append(counter)

The check if the word is contained in words or line should be a separate if.

You could also simplify this logic with the dicts setdefault() method:

d.setdefault(word, []).append(counter)

Or you make d a defaultdict, which simplifies the assignment even more:

from collections import defaultdict
d = defaultdict(list)
...
d[word].append(counter)

About the general algorithm note that at the moment you first iterate over all lines to increment the counter and then, when the counter has already reached it's maximum value, start checking for misspelled words. Probably you should do the checking for each line in the loop where you increment the counter.

sth
  • 222,467
  • 53
  • 283
  • 367
  • the text file is actually called soccer.txt but im using sys.argv ive only been programming for 2 months so im not guna understand everything . ive changed if word not in words to if words not in d but i still get an error print(word, d[counter]) keyerror: 329 – jad May 23 '10 at 12:39
  • i have a list of incorrect words and want to print the line number of the incorrect word where it is in my txt file into a set so then it prints out helo 5 8 # 5 and 8 being the line number in the txt file though any suggestions on how to do that plzzzz – jad May 23 '10 at 13:23
0

Form what you are doing, I suspect that the following would suit you near perfectly:

from collections import defaultdict

text = ( "cat", "dog", "rat", "bat", "rat", "dog",
         "man", "woman", "child", "child") #

d = defaultdict(list)

for lineno, word in enumerate(text):
    d[word].append(lineno)

print d

This gives you an output of:

defaultdict(<type 'list'>, {'bat': [3], 'woman': [7], 'dog': [1, 5],
                            'cat': [0], 'rat': [2, 4], 'child': [8, 9],
                            'man': [6]})

This simply sets up an empty default dictionary containing a list for each item you access, so that you don't need to worry about creating the entry, and then enumerates it's way over the list of words, so you don't need to keep track of the line number.

As you don't have a list of correct spellings, this doesn't actually check if the words are correctly spelled, just builds a dictionary of all the words in the text file.

To convert the dictionary to a set of words, try:

all_words = set(d.keys())
print all_words

Which produces:

set(['bat', 'woman', 'dog', 'cat', 'rat', 'child', 'man'])

Or, just to print the words:

for word in d.keys():
    print word

Edit 3:

I think this might be the final version: It's a (deliberately) very crude, but almost complete spell checker.

from collections import defaultdict

# Build a set of all the words we know, assuming they're one word per line
good_words = set() # Use a set, as this will have the fastest look-up time.
with open("words.txt", "rt") as f:
    for word in f.readlines():
        good_words.add(word.strip())

bad_words = defaultdict(list)

with open("text_to_check.txt", "rt") as f:
    # For every line of text, get the line number, and the text.
    for line_no, line in enumerate(f):
        # Split into seperate words - note there is an issue with punctuation,
        # case sensitivitey, etc..
        for word in line.split():
            # If the word is not recognised, record the line where it occurred.
            if word not in good_words:
                bad_words[word].append(line_no)

At the end, bad_words will be a dictionary with the unrecognised words as the key, and the line numbers where the words were as the matching value entry.

Simon Callan
  • 3,020
  • 1
  • 23
  • 34
  • i actualy do have a list of correct spelling called dictset = [] which is a dictionary of many many words however ill try this thanks this is what i have a txtfile involving words a list of incorrect words and i just want to attach the lne numbers of the incorrect words to each other – jad May 23 '10 at 13:02
  • what u said workd but i want it to print as a set i have a set but i can only print the words as a set – jad May 23 '10 at 13:18
  • for inwords in incorrectwords: print(inwords) this prints a set of my incorrect words but how do i do that to the code you showed me ? cheers – jad May 23 '10 at 13:19
  • im thankful for what uve done and believe if i add one more thing it should work instead of printing the line number of the incorrect word i want to print the line number of the incorrect word located in the txt file what would i add? i tried to add if word in txtfile:??? – jad May 23 '10 at 14:14
  • Updated to a minimal, but complete, example – Simon Callan May 23 '10 at 18:21
  • ey thanx its working but i have incorrect words as list creates already i want to use that because im using sys.argv[] to open them i have a list of words and a list of incorrect words how can i replace them instead of opening the text? cheers – jad May 24 '10 at 01:52
  • ive update my code above i got u into trouble by u not needing to create bad words and goodwords because i have a list of words in the txt file and a list of the misspeled words how can i use them cheers – jad May 24 '10 at 02:08
  • You've just about got it - you just need to change the "word not in text" line to use goodwords, instaed of text, as text is not a defined variable. – Simon Callan May 24 '10 at 20:18