2

I've a text file full of common misspellings and their corrections.

All misspellings, of the same intended word, should be on the same line.

I do have this somewhat done, but not for all misspellings of the same word.

misspellings_corpus.txt (snippet):

I'de->I'd
aple->apple
appl->apple
I'ed, I'ld, Id->I'd

Desired:

I'de, I'ed, I'ld, Id->I'd
aple, appl->apple

template: wrong1, wrong2, wrongN->correct


Attempt:

lines = []
with open('/content/drive/MyDrive/Colab Notebooks/misspellings_corpus.txt', 'r') as fin:
  lines = fin.readlines()

for this_idx, this_line in enumerate(lines):
  for comparison_idx, comparison_line in enumerate(lines):
    if this_idx != comparison_idx:
      if this_line.split('->')[1].strip() == comparison_line.split('->')[1].strip():
        #...
correct_words = [l.split('->')[1].strip() for l in lines]
correct_words
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
StressedBoi69420
  • 1,376
  • 1
  • 12
  • 40
  • 1
    use a `collections.defaultdict(list)` with a key of your good spelling and append each bad spelling as a value. then once done you can write out the values() and key as you like – JonSG Sep 02 '21 at 14:44
  • I'm confused by the desired text. Shouldn't the first line be: `I'd, I'd, I'd, I'd`, and the second line also being: `apple, apple`? – jrd1 Sep 02 '21 at 14:45
  • 1
    @jrd1 The purpose is to have misspellings separated out by a comma `,`, then `->` correct spelling. I will append desired template to post. – StressedBoi69420 Sep 02 '21 at 14:47
  • @JonSG I have now appended a list of `correct_words` to the post. I will look into `collections`. – StressedBoi69420 Sep 02 '21 at 14:50

2 Answers2

2

Store the correct spelling of your words as keys of a dictionary that maps to a set of possible misspellings of that word. The dict is intended for you to easilly find the word you're trying to correct and the set is to avoid duplicates of the misspellings.

possible_misspellings = {}

with open('my-file.txt') as file:
  for line in file:
    misspellings, word = line.split('->')
    word = word.strip()
    misspellings = set(m.strip() for m in misspellings.split(','))

    if word in possible_misspellings:
      possible_misspellings[word].update(misspellings)
    else:
      possible_misspellings[word] = misspellings

Then you can iterate over your dictionary

with open('my-new-file.txt', 'w') as file:
  for word, misspellings in possible_misspellings.items():
    line = ','.join(misspellings) + '->' + word + '\n'
    file.write(line)
Diego Fidalgo
  • 480
  • 3
  • 8
0
lines = []
with open('misspellings_corpus.txt', 'r') as fin:
  lines = fin.readlines()
from collections import defaultdict
my_dict = defaultdict(list)


for line in lines:
    curr_line = line.split("->")[0].replace(" ","")
    if "," in curr_line:
        for curr in curr_line.split(","):
            my_dict[line.split("->")[1].strip()].append(curr)
    else:
        my_dict[line.split("->")[1].strip()].append(curr_line)

for key, values in my_dict.items():
    print(f"{key} -> {', '.join(values)}")
bilke
  • 415
  • 3
  • 6