1

I'm working with UTF-16LE encoded CSV files. I use the Perl module Text::CSV_XS to handle the data:

my $csv = Text::CSV_XS->new ({ binary => 1, sep_char => ';', quote_char => undef, });
open my $io, '<:encoding(UTF-16LE)', $csv_file or die "$csv_file: $!";
my $header_row = $csv->getline($io);

Printing the first row using Data::Dumper, the BOM is shown in the output:

print Dumper $header_row->[0];
# output:
# $VAR1 = "\x{feff}first header col";

According to perldoc, the BOM is preserved because I explicitly state the content to be UTF-16LE. When writing :encoding(UTF-16) only, the BOM is removed.

But I would like to keep it in the code to explicitly state the required encoding. I guess that this is a good thing. If not, please tell me.

But then, I have to handle the BOM, e.g. by writing: $header_row->[0] =~ s/^\x{FEFF}//;

Is this normal? Do I have to care about BOMs in my strings when working with utf-16 encoded files? Or am I making something wrong?

Community
  • 1
  • 1
capfan
  • 817
  • 10
  • 26
  • 2
    Not if you use File::BOM – ikegami Dec 22 '14 at 14:04
  • 1
    If you are keeping the "LE" for documentation purposes, there's no shame in putting it in a comment or POD. – tjd Dec 22 '14 at 16:40
  • 2
    The point of the BOM is that you don't have to know the byte order (and the input could be either way). However, if there is no BOM, you have to declare it yourself. – brian d foy Dec 22 '14 at 21:36

0 Answers0