3

I'd like dump hunspell's pl_PL dictionary.

I found the solution: unmunch /usr/share/hunspell/pl_PL.dic /usr/share/hunspell/pl_PL.aff

But there's problem with encoding.

Part of the output:

ambasadorowaniom
ambasadorowaniach
ambasadorowa�
ambasadoruj�cy
ambasadoruj�cym

I've also tried filtering output with iconv, but the problem wasn't solved:

   affix: z�c� 4, strip: �� 2
   affix: z�ce 4, strip: �� 2
   affix: z�cej 5, strip: �� 2
stable 50 num is 470 flag G
parsing line: MAP 8
parsing line: MAP a�
parsing line: MAP c�

How can i solve this problem?

Mateusz Jagiełło
  • 6,854
  • 12
  • 40
  • 46

2 Answers2

2

If you still wonder how to solve that problem (which I bumped into tonight), or if someone would have it in the future and look here - iconv solves the problem - dictionary file seems to be encoded with iso-latin-2:

unmunch pl_PL.dic pl_PL.aff 2>/dev/null | iconv -f iso-8859-2 -t utf
1

Short version: It's a problem with your console terminal. Change it to another one like xterm.

Longer: Strange. It should be UTF8. Are you sure it is not caused by your console or terminal not supporting UTF8? Check result in any UTF8 capable graphic editor. And check your LOCALE settings.

Disclaimer: I want to help. But, since I cannot comment anything (1 reputation point), request clarification or sending message to user I have to provide any answer (in my Answer) to not be deleted.

szszsz
  • 31
  • 5
  • Yes. This is werid. My LOCALE is **LANG=en_US.UTF-8**, and my terminal (gnome-terminal) supports utf8. :) I actually doesn't resolve this problem. But I just used aspell instead of hunspell. Little workaround. – Mateusz Jagiełło Oct 23 '15 at 13:33
  • In the case of some dictionaries it may a previous artifact of conversion to UTF8. For example, the Australian dictionary had some errors like that for é that emerged years ago when there was a faulty conversion (and only affected 200 words out of 100,000), but actually the error has been fixed in the latest version of the dictionary. – Tom Anderson Sep 15 '16 at 08:58
  • @MateuszJagiełło how did you use aspell? – Furkan Gözükara Mar 03 '17 at 00:27