1

I'm trying to use pyenchant to spell-check English and Bulgarian text. I have an issue where in English, words are recognized both when they are lower case and when they start with an upper case letter. For example:

>>> d = Dict('en_GB')
>>> d.check('car')
True
>>> d.check('Car')
True

However, in Bulgarian:

>>> d = Dict('bg_BG')
>>> d.check('кола')
True
>>> d.check('Кола')
False

Is this normal behavior? I want to use the SpellChecker class to check whole paragraphs and this is getting in the way. I don't really want to use .lower() on the whole str, as this seems like a hack. I'm using pyenchant==1.6.8 and Python 3.5.2. The en_US and en_GB dictionaries came with pyenchant and I downloaded the bg_BG dictionary from https://cgit.freedesktop.org/libreoffice/dictionaries/plain/bg_BG/bg_BG.dic and aff respectively. I had to convert them from Windows-1251 to UTF-8, because otherwise no words were being recognized.

  • I don't know much about the Bulgarian dictionary, but note that you can add a personal word list (e.g. with capitalized words) as described in the [docs](http://pythonhosted.org/pyenchant/tutorial.html#personal-word-lists) – patrick Mar 26 '17 at 12:19
  • @patrick I'll have to add every Bulgarian word with every form and so on to the PWL. It hardly seems like a good solution. I'm already using the PWL, but this is for domain-specific words AFAIK, not for huge lists. – Ivailo Karamanolev Mar 26 '17 at 14:46
  • 1
    True, but a list with initial capitalization can be easily created automatically. Idk anything about spelling conventions etc, just wanted to point you to it. – patrick Mar 26 '17 at 15:09

0 Answers0