0

I have just started diving into NLP and want to use hunspell in order to perform tokenization. However, until now I was not able to use hunspell properly, since it returns "false" everytime I use the function "hunspell_check".

I installed hunspell serveral times and checked, whether dictionaries are actually present (they are). Also, I tried different functions of hunspell (like "hunspell()"), but they do not work either. Interestingly, I cannot find an error message of any kind.

> hunspell_check("work")
[1] FALSE

> dictionary(lang = "en_US")
<hunspell dictionary>
 affix: C:\Users\NilsKlähn\Documents\R\win-library\3.6\hunspell\dict\en_US.aff 
 dictionary: C:\Users\NilsKlähn\Documents\R\win-library\3.6\hunspell\dict\en_US.dic 
 encoding: ISO8859-1 
 wordchars: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ 
 added: 0 custom words


I expect the function hunspell_check("work") to return true, instead of false, since it is spelled correctly. The dictionary seems to be alright though.
avrFreak
  • 9
  • 1
  • I can't recreate the error - it works fine for me (also on Windows with latest version of R and hunspell). The only difference is that mine says UTF-8 encoding, whereas yours is ISO8559-1. I'd be surprised if that made a difference to a simple word like 'work', but it might be worth investigating. Note the warning on encodings at the bottom of the `hunspell_check` help page. – Andrew Gustar Sep 30 '19 at 12:28
  • Good hint! For some reason I do not manage to change the encoding. Do you know how to do that? I have already tried to reinstall both R and hunspell... – avrFreak Oct 02 '19 at 11:47
  • I'm not sure, in this context. I guess it is the encoding of the dictionary that you have, so perhaps try to get the UTF-8 version of en_US. Although, for me, I think that was installed by default with hunspell, so if you have reinstalled it, perhaps the encoding means something else. Another thought - do you have any other installations of hunspell on your system - that can sometimes cause problems? – Andrew Gustar Oct 02 '19 at 17:44
  • No, there is no other Installation of hunspell. Also, the dictionary itself is delivered in utf-8. That's why I'm so sceptical about the encoding now. – avrFreak Oct 03 '19 at 15:08

0 Answers0