1

Pyenchant messes up foreign characters and the spellcheck fails. My girlfriend is german so the word "häßlich" is a real german word and I also checked the word using different spellchecking services too.

The script file encoding is ANSI as UTF-8. I have tried to encode and decode the word into different kinds of character encodings, too.


#!/usr/bin/python
# -*- coding: utf-8 -*-

# Python bindings for the enchant spellcheck
import enchant

# Enchant dictionary
enchantdict = enchant.Dict("de_DE")

# Define german word for "ugly"
word = "häßlich"

# Print the original word and the spellchecked version of it
print word, "=", enchantdict.check(word)

And the output is as follows: häßlich = False


Also, if I change the script encoding into plain ANSI, this is what I get:

hõ¯lich =
** (python.exe:1096): CRITICAL **: enchant_dict_check: assertion `g_utf8_validate(word, len, NULL)' failed
Traceback (most recent call last):
  File "C:\Temp\koe.py", line 14, in <module>
    print word, "=", enchantdict.check(word)
  File "C:\Python27\lib\site-packages\enchant\__init__.py", line 577, in check
    self._raise_error()
  File "C:\Python27\lib\site-packages\enchant\__init__.py", line 551, in _raise_
error
    raise eclass(default)
enchant.errors.Error: Unspecified Error

I am using: pyenchant-1.6.5.win32.exe python-2.7.3.msi Windows 7


...And if you have a better spellchecker in mind, please tell me about it, I will test it out :)

elfduck
  • 11
  • 2
  • What exactly do you mean by "change the script encoding to plain ANSI"? If you mean ASCII, that's impossible; you can't type "häßlich" in ASCII. If you mean something else… well, it depends on what you mean. Meanwhile, `print name` may not necessarily do the right thing; it depends on your terminal being set to the same encoding and Python's sys default encoding (although there are some hacks to work around common issues in Windows). Still, as Eric MSFT says, none of this is supposed to be doable at all unless you use Unicode strings. – abarnert Sep 19 '12 at 18:22

1 Answers1

2

You are getting tripped up on the fact that there are two types of strings in Python: byte strings and Unicode strings you need a 'u' in front of the string for it to be a Unicode string:

word = u"häßlich"

Also häßlich is the old spelling of hässlich (the latter is in the dictionary and will be returned as a suggestion). You can add häßlich to your personal list of correctly spelled words if you want it to be considered correctly spelled.

enchantdict.add(word)

Eric MSFT
  • 3,246
  • 1
  • 18
  • 28