1

I would like to print out the content of an associative array. For this I'm using Data::dumper.

So, for exemple, if the associative array is called "%w", I write :

  print OUT Dumper(\%w);

Here's the problem: there are some words like "récente" that are printed out as "r\x{e9}cente".

If I write just :

print OUT %w;

I've no problems, so "récente" it will be printed out as "récente".

All text files used for the script are in utf8. Moreover I use the module "utf8" and I specify always the character encoding system.

For ex. :

open( IN, '<', $file_in);
binmode(IN,":utf8");

I'm pretty sure that the problem is related to Data::dumper. Is there a way to solve this or another way to print out the content of an associative array?

Thank you.

KeyPi
  • 516
  • 5
  • 20
  • It worries me slightly that you are talking about "associative arrays". Since the release of Perl 5 almost twenty years ago, we have called them "hashes". If you are reading tutorials or books that still call them "associative arrays" then the information that you are getting is horribly out of date. – Dave Cross Apr 01 '14 at 12:00
  • Re "I'm pretty sure that the problem is related to Data::dumper." What problem? Re "Is there a way to solve this or another way to print out the content of an associative array?" Sure, you can print it out any way you want to. – ikegami Apr 01 '14 at 13:47

4 Answers4

4

This is intentional. The output by Data::Dumper is intended to produce the same data structure when evaluated as Perl code. To limit the effect of character encodings, non-ASCII characters will be dumped using escapes. In addition to that, it's sensible to set $Data::Dumper::Useqq = 1 so that any unprintable characters are dumped using escapes.

Data::Dumper isn't really meant as a way to display data structures – if you have specific formatting requirements, just write the necessary code yourself. For example

use utf8;
use feature 'say';
open my $out, ">:utf8", $filename or die "Can't open $filename: $!";
my %hash = (
    bárewørdş => '–Uni·code–',
);

say { $out } "{";
for my $key (sort keys %hash) {
    say { $out } "  $key: $hash{$key}";
}
say { $out } "}";

produces

{
  bárewørdş: –Uni·code–
}
amon
  • 57,091
  • 2
  • 89
  • 149
2

You can also use Data::Dumper::AutoEncode.

use utf8;
use Data::Dumper::AutoEncode;

warn eDumper($hash_ref);

cpan Data::Dumper::AutoEncode

Fotis_zzz
  • 150
  • 1
  • 10
0

This works for me:

use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Useperl = 1;
binmode STDOUT, ":utf8";
{ no warnings 'redefine';
    sub Data::Dumper::qquote {
        my $s = shift;
        return "'$s'";
    }
}
my $s = "rcente\x{3a3}";
my %w = ($s=>12);
print Dumper(\%w), "\n";
perreal
  • 94,503
  • 21
  • 155
  • 181
0

Data::Dumper is a debugging tool. It's letting you know what the string contains without making it susceptible to encoding errors. That's not a problem, that's a feature. What it emitted ("r\x{e9}cente") is a sufficiently readable representation of the string you had (72 E9 63 65 6E 74 65).

ikegami
  • 367,544
  • 15
  • 269
  • 518