1

Trying to copy to clipboard text in cp1251.

#!/usr/bin/perl -w

use Clipboard;
use Encode;

    my $ClipboardOut = "A bunch of cyrillic characters - а-б-в-г \n";
    Encode::from_to($ClipboardOut, 'utf-8', 'cp1251');

    Clipboard->copy($ClipboardOut);

Instead of Cyrillic letters "?" are pasted in any Windows apps. If I remove line with Encode - Cyrillic letters produce "a'-s with different modifiers:

A bunch of cyrillic characters   à-á-â-ã 

I guess I miss something extra-simple but I'm stuck on it. Can somebody help me, please?

Disco
  • 9
  • 3
  • In what encoding do you save the script? – choroba Aug 10 '20 at 12:03
  • In cp1251. When Ctrl-C - Ctrl-V from Notepad - everything works normally. – Disco Aug 10 '20 at 12:13
  • 1
    So why do you tell Perl to convert from UTF-8? – choroba Aug 10 '20 at 12:21
  • Because without converting it places in clipboard " à-á-â-ã " instead of " а-б-в-г ". I guess it automatically convert script codepage to some internal variant - I thought it is to be utf-8... If not (I already see - definitely not) - what do I need to state instead "utf-8" – Disco Aug 10 '20 at 12:30
  • It really stored in cp1251 - I've checked this by printing to console, working in cp866: correct text is seen when I use Encode::from_to($ConsoleOut, 'cp1251', 'cp866'); So - the problem seems to be in Clipboard output. Maybe there are another ways to copy text to clipboard? – Disco Aug 10 '20 at 12:48
  • What happens when you try to store a decoded string? `Endoce::decode('cp1251', $clipboardOut)` – choroba Aug 10 '20 at 13:03
  • The same "à-á-â-ã" is pasted from clipboard to any window. – Disco Aug 10 '20 at 13:27
  • You should add `use utf8` to your script, then you should simply have to do `Clipboard->copy(Encode::encode('cp1251',$ClipboardOut))` but it still does not work. Maybe the clipboard only supports unicode? According to the [source](https://metacpan.org/source/SHLOMIF/Clipboard-0.26/lib/Clipboard/Win32.pm#L8) Clipboard` uses `Win32::Clipboard->Set()`, which calls [SetClipboardData()](https://metacpan.org/source/JDB/Win32-Clipboard-0.58/Clipboard.xs#L626) in Win32 API. [Here](https://learn.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-setclipboarddata) is the documentation. – Håkon Hægland Aug 10 '20 at 14:59
  • What OS?, Windows? – ikegami Aug 10 '20 at 19:58

2 Answers2

2

In Windows, Clipboard expects text encoded using the system's Active Code Page. That's because Clipboard is just a wrapper for Win32::Clipboard. And while Win32::Clipboard allows you to receive arbitrary Unicode text from the clipboard, it doesn't allow you to place arbitrary Unicode text on the clipboard. So using that module directly doesn't help.

This is limiting. For example, my machine's ACP is cp1252, so I wouldn't be able to place Cyrillic characters on the clipboard using this module.

Assuming your system's ACP supports the Cyrillic characters in question, here are two solutions: (I used Win32::Clipboard directly, but you could use Clipboard the same way.)


Source code encoded using UTF-8 (This is normally ideal)

use utf8;

use Encode           qw( encode );
use Win32            qw( );
use Win32::Clipboard qw( );

# String of decoded text aka Unicode Code Points because of `use utf8;`
my $text_ucp = "а-б-в-г\n";

my $acp = "cp" . Win32::GetACP();
my $clip = Win32::Clipboard();
$clip->Set(encode($acp, $text_ucp));

Source code encoded as per Active Code Page

Perl expects source code to be encoded using ASCII (no utf8;, the default) or UTF-8 (with use utf8;). However, string and regex literals are "8-bit clean" when no utf8; is in effect (the default), meaning that any byte that doesn't correspond to an ASCII character will result in a character with the same value as the byte.

use Win32::Clipboard qw( );

# Text encoded using source's encoding (because of lack of `use utf8`),
# which is expected to be the Active Code Page.
my $text_acp = "а-б-в-г\n";

my $clip = Win32::Clipboard();
$clip->Set($text_acp);
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • I have double checked - Actibe Code Page is cp1251 (`my $acp = "cp" . Win32::GetACP(); print STDOUT $acp; ` gives `cp1251` on console). But both codes still give "à-á-â-ã" when Clipboard content is pasted in any Windows place. – Disco Aug 11 '20 at 10:03
  • Add `printf("%vX - %s - %vX\n", $text_ucp, $acp, encode($acp, $text_ucp));` (first snippet) or `printf("%vX\n", $text_acp);` (second snippet) and provide the output. – ikegami Aug 11 '20 at 10:29
  • Hold on, those aren't even in the cp1251 code page. Even if you did something wrong, that's not possible. Try pasting into `notepad`. Put something else (like `abc`) in the clipboard before running the program. – ikegami Aug 11 '20 at 10:32
  • Copied `abc` into Clipboard, pasted it in Notepad - `abc`. Then run second snippet with console output - got `E0.2D.E1.2D.E2.2D.E3.A`. Paste to Notepad - `à-á-â-ã`. – Disco Aug 11 '20 at 10:46
  • As far as I can tell, what you tell me is impossible. This should only happen for a cp1251 file on a cp1252 system. – ikegami Aug 11 '20 at 11:02
  • Yes, and that should only happen on an system that uses the cp1252 code page. – ikegami Aug 11 '20 at 11:14
  • I have double checked it - Active Code Page is reported as "cp1251" (`my $acp = "cp" . Win32::GetACP(); print STDOUT $acp;` gives `cp1251` on console), and letters from code are printed on console correctly only when converted by `Encode::from_to($ConsoleOut, 'cp1251', 'cp866'`. – Disco Aug 11 '20 at 11:46
  • That just means that your script is encoded using cp1251 and that your terminal's OEM CP is 866. (The OEM CP can be changed on a per-console basis, but the ANSI/Active CP is system-wide. For example, if you entered `chcp 1251` in the console, you could skip the conversion.) – ikegami Aug 11 '20 at 13:53
  • No. Having a file encoded using cp1251 only means you have a file encoded using cp1251. Files can have any encoding. For example, I have a lots of UTF-8 files, but that doesn't mean my ACP is 65001 (UTF-8). – ikegami Aug 11 '20 at 15:13
  • `my $acp = "cp" . Win32::GetACP(); print STDOUT $acp;` gives `cp1251` on console - it definitely means that Active CP is 1251, not 1252... – Disco Aug 11 '20 at 16:17
-1

Found temporary solution: script generates .bat file with echo blah-blah-blah | clip, run it and then delete.

Disco
  • 9
  • 3
  • That's not going to change how the program works. Something else is also different. Let me know when you figure out what relevant thing you changed – ikegami Aug 11 '20 at 21:33
  • As for now this solution gives me what I need - desired text in clipboard. If I had enough experience to find sources of such a bugs - I would not need a help here. Sorry, but yours' "impossible" made not any help for me, as well as "minus"... – Disco Aug 11 '20 at 21:46
  • I don't doubt that you have a solution; I'm just asking that you let know what it is. If be interested in finding out what the problem was – ikegami Aug 11 '20 at 21:47