2

I have an application that is being expanded to the UK and I will need to add support for Latin-9 Unicode. I have done some Googling but found nothing solid as to what is involved in the process. Any tips?

Here is some code (Just the bits for Unicode stuff)

use Unicode::String qw(utf8 latin1 utf16);

# How to call
$encoded_txt = $self->unicode_encode($item->{value});

# Function part
sub unicode_encode {

    shift() if ref($_[0]);
    my $toencode = shift();
    return undef unless defined($toencode);

    Unicode::String->stringify_as("utf8");
    my $unicode_str = Unicode::String->new();


    # encode Perl UTF-8 string into latin1 Unicode::String
    #  - currently only Basic Latin and Latin 1 Supplement
    #    are supported here due to issues with Unicode::String .
    $unicode_str->latin1( $toencode );
    ...

Any help would be great and thanks.

EDIT: I did find this post: http://czyborra.com/charsets/iso8859.html

Phill Pafford
  • 83,471
  • 91
  • 263
  • 383
  • 3
    Why do you need to support Latin-9? Is there something specific in that format for data you'll be receiving? If it's certain characters that you need to support rather than a specific character set, I'd recommend going with full on Unicode and UTF-8. – mpeters Jun 14 '10 at 19:23
  • Latin-9 is like Latin-1 with the euro symbol, it's a popular choice if you don't want or can't jump to Unicode. – leonbloy Jun 15 '10 at 00:06
  • Latin-9 is a business requirement – Phill Pafford Jun 15 '10 at 12:21

2 Answers2

5

Unicode::String is ancient, and designed to add Unicode support to older Perls. Modern versions of Perl (5.8.0 and up) have native Unicode support. Look at the Encode module and the :encoding layer. You can get a list of the supported encodings in your Perl with perldoc Encode::Supported.

Basically, you just need to decode/encode to Latin-9 on input & output. The rest of the time, you should use Perl's native UTF-8 strings.

# Read a Latin-9 file:
open(my $in, '<:encoding(Latin9)', 'some/file');
my $line = <$in>; # Automatically converts Latin9 to UTF-8

# Write a Latin-9 file:
open(my $out, '>:encoding(Latin9)', 'other/file');
print $out $line; # Automatically converts UTF-8 to Latin9
cjm
  • 61,471
  • 9
  • 126
  • 175
  • Thanks, I did look at the reference you gave but I didn't see Latin-9. Any other references/advice? – Phill Pafford Jun 14 '10 at 18:39
  • Thanks again, I do have one other issue as some of the client running the old software might not be able to upgrade to a new version of Perl. Is there any support for Latin-9 using the Unicode::String way instead of :encoding? Just want to make it as easy for clients to upgrade as possible. – Phill Pafford Jun 14 '10 at 18:48
  • Try `perldoc Encode::Supported` to get a list of supported encodings. search.cpan.org doesn't seem to find the current version (because it got moved to a different place in the tarball). – cjm Jun 14 '10 at 18:51
  • 3
    Perl 5.6.1 is 9 years old now. (Even 5.6.2 is almost 7.) It's time for them to upgrade. I doubt it would be hard to add Latin9 support to Unicode::String, but you'd probably have to do it yourself. – cjm Jun 14 '10 at 18:56
  • hmm trying to use encode for this and it's not working. UK € Code: encode("iso-8859-15", "UK €") any thoughts as to why? – Phill Pafford Jun 15 '10 at 14:59
0

In perldoc Encode::Supported it's referred to as ISO-8859-15 (!). Here is some heavily trimmed down output from perldoc:

           Lang/Regions  ISO/Other Std.  DOS     Windows Macintosh  Others
       ----------------------------------------------------------------
       Latin9 [4]    iso-8859-15
       ----------------------------------------------------------------

       [4] Nicknamed Latin0; the Euro sign as well as French and Finnish
           letters that are missing from 8859-1 were added.
kovacsbv
  • 351
  • 4
  • 11