I am spell checking some files with Python Enchant, and want it to ignore proper nouns. The trade off between it correcting incorrectly spelled proper nouns and incorrectly 'correcting' ones it doesn't know seems too large (although any advice on this also aprreciated!)
This is my code, but at the moment it is still correcting the words in the NNP list.
chkr = SpellChecker("en_GB")
f = open('test_file.txt', 'r', encoding = 'utf-8')
text = f.read()
tagged = pos_tag(word_tokenize(text))
NNP = [(word) for word, tag in tagged if tag == 'NNP']
chkr.set_text(text)
for err in chkr:
if err is word in NNP:
err.ignore_always()
else:
sug = err.suggest()[0]
err.replace(sug)
corrected = chkr.get_text()
print (NNP)
print (corrected)
In the output, for example, 'Boojum' is changed to Boomer even though it is in the NNP list.
Could someone point me in the right direction? I'm fairly new to Python. Thanks in advance.