#!/usr/local/bin/perl
use strict;
use warnings;
use Text::SpellChecker;
my $text = "coördinator";
my $checker = Text::SpellChecker->new( text => $text );
while ( my $word = $checker->next_word ) {
print "Bad word is $word\n";
}
Output: Bad word is rdinator
Desired: Bad word is coördinator
The module is breaking if I have Unicode in $text
. Any idea how can this be solved?
I have Aspell 0.50.5 installed which is being used by this module. I think this might be the culprit.
Edit: As Text::SpellChecker
requires either Text::Aspell
or Text::Hunspell
, I removed Text::Aspell
and installed Hunspell
, Text::Hunspell
, then:
$ hunspell -d en_US -l < badword.txt
coördinator
Shows correct result. This means there's something wrong either with my code or Text::SpellChecker.
Taking Miller's suggestion in consideration I did the below
#!/usr/local/bin/perl
use strict;
use warnings;
use Text::SpellChecker;
use utf8;
binmode STDOUT, ":encoding(utf8)";
my $text = "coördinator";
my $flag = utf8::is_utf8($text);
print "Flag is $flag\n";
print "Text is $text\n";
my $checker = Text::SpellChecker->new(text => $text);
while (my $word = $checker->next_word) {
print "Bad word is $word\n";
}
OUTPUT:
Flag is 1
Text is coördinator
Bad word is rdinator
Does this mean the module is not able to handle utf8 characters properly?