0

Is it right to say: if a string read from STDIN checked with utf8::is_utf8 returns true then there is set a decoding layer for STDIN?

#!/usr/bin/env perl
use warnings;
use strict;
use 5.10.0;
use open qw( :std :utf8 );

my $in = <STDIN>;
say utf8::is_utf8( $in ) ? 'YES' : 'NO';    # YES

binmode STDIN, ':bytes';
$in = <STDIN>;
say utf8::is_utf8( $in ) ? 'YES' : 'NO';    # No

binmode STDIN, ':encoding(latin1)';
$in = <STDIN>;
say utf8::is_utf8( $in ) ? 'YES' : 'NO';    # YES
sid_com
  • 24,137
  • 26
  • 96
  • 187
  • I'd be cautious jumping to conclusions like that. All `utf8::is_utf8` really tells you whether Perl considers it to be utf8 encoded internally, which it also happens to do with latin1 encoding and input as per the last lines of your example. It doesn't know how it got the string or what's been done to it elsewhere. What is it you're really trying to achieve? – Leeft Jul 18 '14 at 10:18
  • I would like to know if `binmode STDIN ':encoding(...)` is set. If is set I would let return a subroutine decoded strings else not decoded strings. – sid_com Jul 18 '14 at 11:35
  • Are you asking specifically about `<>`, or about scalars in general? (Gotta go, but will be back later.) – ikegami Jul 18 '14 at 12:28
  • It is for the `readline` method in https://metacpan.org/source/KUERBIS/Term-ReadLine-Tiny-0.002/lib/Term/ReadLine/Tiny.pm. Until now the encoding/decoding in the method worked for me but I don't know if I have tested it well enough. – sid_com Jul 18 '14 at 15:35
  • @sid_com: If your situation is complex then you may want to use the facilities of the [`Encode`](https://metacpan.org/module/Encode) module directly instead of getting `readline` to do it implicitly – Borodin Jul 18 '14 at 17:01
  • Term::Readline::Tiny::readline doesn't read from a file handle, at least not directly. It reads using a plugin, and goes knows what the plugin does. As such, `utf8::is_utf8` does not indicate anything. – ikegami Jul 18 '14 at 17:05

1 Answers1

3

The is_utf8 function (whether it is from utf8 or from Encode) just tells you whether a string has the internal UTF8 flag set. That is pretty much a consequence of what you have said it contains yourself, and is very different from it being valid UTF-8.

If you want to check the capabilities of a file handle then you should take a look at the PerlIO::Layers module. A call like

query_handle(*STDIN, 'utf8')

will return true if the handle is UTF-8-capable, by setting either :utf8 or :encoding(utf8).

If you want to check specifically for :encoding(utf8) then you need

query_handle(*STDIN, 'layer', 'encoding')

but note that this will show only whether there is an :encoding() layer of any sort, which could be :encoding(iso-8859-1).

If you really need to check which encoding is in place, the only way I know is to examine the return value of get_layers from the same module. It returns a list of arrays corresponding to the PerlIO layers in effect on the handle. Something like this

(
  ["unix",     undef,  ["CANREAD", "OPEN"]],
  ["encoding", "utf8", ["FASTGETS", "CANREAD", "LINEBUF", "UTF8"]],
)
Borodin
  • 126,100
  • 9
  • 70
  • 144