I am using the latest version of spacy_hunspell with Portuguese dictionaries. And, I realized that when I have inflected verbs containing special characters, such as the acute accent (`) and the tilde (~), the spellchecker fails to retrieve the correct verification:
import hunspell
spellchecker = hunspell.HunSpell('/usr/share/hunspell/pt_PT.dic',
'/usr/share/hunspell/pt_PT.aff')
#Verb: fazer
spellchecker.spell('fazer') # True, correct
spellchecker.spell('faremos') # True, correct
spellchecker.spell('fará') # False, incorrect
spellchecker.spell('fara') # True, incorrect
spellchecker.spell('farão') # False, incorrect
#Verb: andar
spellchecker.spell('andar') # True, correct
spellchecker.spell('andamos') # True, correct
spellchecker.spell('andará') # False, incorrect
spellchecker.spell('andara') # True, correct
#Verb: ouvir
spellchecker.spell('ouvir') # True, correct
spellchecker.spell('ouço') # False, incorrect
Another problem is when the verb is irregular, like ir
:
spellchecker.spell('vamos') # False, incorrect
spellchecker.spell('vai') # False, incorrect
spellchecker.spell('iremos') # True, correct
spellchecker.spell('irá') # False, incorrect
As far as noticed, the problem does not happen with nouns with special characters:
spellchecker.spell('coração') # True, correct
spellchecker.spell('órgão') # True, correct
spellchecker.spell('óbvio') # True, correct
spellchecker.spell('pivô') # True, correct
Any suggestions?