4

I'm trying to check a list of tags against the english dictionary. I'm using pyenchant and I keep getting error. It seems to have an error after it reads a "?". I attempted to take out all punctuation by using the string library and the following code:

for punc in string.punctuation:
    title = title.replace(punc,'')

but some how this char which appears to look like a ? is throwing off the dictionary.

Code snippet:

if word not in stopwords.words('english'):
    print word, "=", d.check(word) 
    if d.check(word):       
        tags.append(word.lower())

Response:

Learning = True
Lens = True
Children = True
Pumkincom = False
Pumkin = False

** (process:49042): CRITICAL **: enchant_dict_check: assertion `g_utf8_validate(word, len, NULL)' failed
     ? =

I'm using Python 2.7.3 and pyenchant-1.6.5-py2.7

EDIT: I think I solved this problem by checking to see if len(word)==1 but I would like to know why this happens.

user1495088
  • 103
  • 7

1 Answers1

4

I faced this problem before, but it was due to non English letters existing in the text I advice you to make sure that word contains only English letters

Yasmin
  • 931
  • 3
  • 14
  • 35
  • I doubt if it is a non-english chrs which r creating problem. Instead I check dictionary.check('non-english') for non-english chrs it return False. I think this problem is related to encoding. String which is passed to dictionary.check() is not valid UTF-8 unicode string. – Saurabh Jain Jul 05 '16 at 15:40