0

To generate ascii values and feed $bitmask I use:

perl -E 'say chr 101' > value_ascii.txt

My question comes up when I try to generate an ascii value for the number 1185644. That is, now I want the value of $bitmask to be the ascii value corresponding to the numeric value 1185644.

If I use perl -E 'say chr 1185644' > ascii_expected.txt the value obtained by I don't get the correct range from 1185644 nth subset to 1185744 nth subset. So I think the conversion perl -E 'say chr 1185644' > ascii_expected.txt is not working.

I have been trying to correctly acquire the ascii value of 1185644 by doing:

perl -E 'say chr 1185644' > ascii_expected.txt

but what is printed:

ô¡<0x9d>¬

and get error:

Wide character in say at -e line 1.

I tried to understand how to use it:

sub nice_string {
       join("",
       map { $_ > 255                    # if wide character...
             ? sprintf("\\x{%04X}", $_)  # \x{...}
             : chr($_) =~ /[[:cntrl:]]/  # else if control character...
               ? sprintf("\\x%02X", $_)  # \x..
               : quotemeta(chr($_))      # else quoted or as themselves
       } unpack("W*", $_[0]));           # unpack Unicode characters
  }
nice_string("foo\x{1185644}bar\n")

but I couldn't

Does not seem to be the correct value.

I tried to do:

use open OUT => ':locale'; 
open(O, ">koi8");
print O chr(1185644); 
close O;

but my output print to file is:

\x{12176C}

and get error:

Code point 0x12176C is not Unicode, may not be portable in print at p1.pl line 3.

Note: I expect an ascii_value for 1185644 such that I can use it as a variable like for example $b = 'ascii_value'; in perl.

7beggars_nnnnm
  • 697
  • 3
  • 12
  • 3
    There is no such thing. If you explain *why* you want it, someone might be able to help you do something useful that suits your needs, but what you're asking for is meaningless. – hobbs Aug 19 '21 at 14:23
  • Unicode do not support such code point (just Until 0x10FFFF). UCS now just support character as Unicode (no more until 0x7FFFFF). – Giacomo Catenazzi Aug 19 '21 at 14:53
  • @hobbs I added the contextual usage for which I am requesting help. – 7beggars_nnnnm Aug 19 '21 at 15:10
  • 2
    Can you add REAL expected output? Sorry, there's just no way that 1185644 will be `'ascii_value'` in any reasonable encoding. Also show expected output for all your functions. – Oleg V. Volkov Aug 19 '21 at 17:48
  • 2
    Do you use the term "_ascii value_" loosely? Can you clarify what you mean by it, perhaps by a few examples? – zdim Aug 19 '21 at 18:22
  • It could be an ascii string that would equate to that in a specific encoding, eg `1185644 == 0x12176C == 0x12 0x17 0x6C == chr(0x12) . chr(0x17) . chr(0x6c)` but that really doesn't make much sense for the value you provided – sbingner Aug 20 '21 at 22:28

1 Answers1

4

My question comes up when I try to generate an ascii value for the number 1185644.

This makes no sense. The ASCII character set only has 128 different characters (0-127).

I want the value of $bitmask to be the ascii value corresponding to the numeric value 1185644.

To create a string that consists of a character with a value of 1185644, you can use chr(1185644). ASCII is not involved in this.

get error: Wide character in say at -e line 1.

File handles without an encoding layer added expect strings of bytes, which is to say a string where every character has a value in 0..255. 1185644 is clearly not in that range, so you provided an invalid string. When this happens, Perl assumes you meant to encode the string using utf8[1] and does so, but warns you that this happened ("Wide character").

my output print to file is: \x{12176C}

When a file handle has an encoding layer, Unicode Code Points are expected to be provided.[2] Unicode Code Point 1185644 isn't part of the KOI-8 character set, and thus can't be encoded by KOI-8. \x{12176C} was used in place of the un-encodable character.


So you asked to do something impossible. So what were you really asking to do? Well, it's unclear, but perhaps you want to be able to store a string containing character 1185644 in a file so that you can get it back. There's only one character encoding that I know that can encode characters beyond Unicode, and that's utf8 (not to be confused with UTF-8).[1] It can encode any character Perl strings support.

Of course, you could use your own format. For example, we could extend UCS-4be to 64 bit:

pack "Q>*", unpack "W*", $s     # "UCE-8be" encoder

But one has to wonder why you're using strings of large characters in the first place.


  1. utf8 is a Perl-specific extension of UTF-8.

  2. Values larger than those supported by Unicode are also accepted by some encodings. 1185644 is such a value. I'm still going to call them Unicode Code Points for lack of a better name. This doesn't rule out 1185644 as a valid input.

ikegami
  • 367,544
  • 15
  • 269
  • 518