0

i have a list of sentences with wrongly spelled prepositions. i have a list of correctly spelled preps:

ref_data = ['near','opposite','off','towards','behind','ahead','below','above','under','over','in','inside','outside']

i need to compute the soundex of words from my data and substitute it with my reference word if the soundex matches.. heres my code:

for line in text1:
for word in line.split():
    if jellyfish.soundex(word)==jellyfish.soundex([words,int in enumerate(ref_data)])
       word = #replace code here

i am really confused .. text1 contains sentences like ['he was nr the fountain',...many more]. please help.. my syntax is wrong ..

Hypothetical Ninja
  • 3,920
  • 13
  • 49
  • 75

1 Answers1

1

I'd use:

# mapping from soundex to correct word
soundex_to_ref = {jellyfish.soundex(w): w for w in ref_data}

for line in text1:
    words = [soundex_to_ref.get(jellyfish.soundex(w), w) for w in line.split()]

This produces a list of words for each line, with all words that match correctly-spelled words by soundex replaced by the correctly-spelled word.

The [... for .. in ...] syntax is a list comprehension, it produces a new value for each item in the for loop. So, for each word in line.split() we produce the output of the soundex_to_ref.get(jellyfish.soundex(w), w) expression in the output list.

The soundex_to_ref object is a dictionary, generated from the ref_data list; for each word in that list the dictionary has a key (the soundex value for that word), and the value is the original word. This lets us look up reference words easily for a given soundex.

dict.get() lets you look up a key in a dictionary, and if it is not present, a default is returned. soundex_to_ref.get(jellyfish.soundex(w), w) creates the soundex for the current word w, looks up a reference word, and if the soundex is not present in the dictionary, the original word is replaced.

You can join the words list back into a sentence by using:

line = ' '.join(words)

You can rebuild text1 in one expression with:

text1 = [' '.join([soundex_to_ref.get(jellyfish.soundex(w), w) for w in line.split()])
         for line in text1]
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • hi ,thanks .. but i stii cant see any changes in text1 after executing the above code.. :( any idea?? – Hypothetical Ninja Feb 08 '14 at 10:15
  • @Sword: read on; the answer to that is at the end. Your question never said anything about altering `text1` itself, btw. – Martijn Pieters Feb 08 '14 at 10:16
  • yeah ,i'll do that in a moment .. i tried the code above but all words are joint.. this is how it looks like -hewasnearthefountain . how do i deal with it? i tried by adding a space in ' '.join.. – Hypothetical Ninja Feb 08 '14 at 10:23
  • Right, sorry, that was meant to be a space. Corrected. – Martijn Pieters Feb 08 '14 at 10:23
  • No, then you didn't run the code now in the answer. `str.join()` joins the list of strings with the string object. `' foo '.join(['bar', 'baz', 'spam'])` produces `'bar foo baz foo spam'` for example. So if you see no spaces you are *not* using `' '.join(...)`. – Martijn Pieters Feb 08 '14 at 10:26
  • No, sorry, I won't. I understand you see words joined without spaces, which simply means you are **not** running the code I posted. – Martijn Pieters Feb 08 '14 at 10:30