This seems to be a bug in Perl. I had thought it was a bug in Windows code page 65001 not really being supported for the console but I finally made test programs in C and Perl and the problem does not happen in the C version. It happens no matter where the Unicode character occurs in the line but the line you're printing must be wider than the console supports.
Here is my C program:
#include "stdafx.h"
#include "Windows.h"
int _tmain(int argc, _TCHAR* argv[])
{
BOOL b = SetConsoleOutputCP(65001);
printf("set console output codepage returned %d\n", b);
printf("cαfe\n");
printf("1234567890 café\n");
printf("1234567890 1234567890 cαfe\n");
printf("1234567890 1234567890 1234567890 café\n");
printf("1234567890 1234567890 1234567890 1234567890 cαfe\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 café\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n");
printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n");
return 0;
}
And here is my Perl program:
#
use utf8;
binmode STDOUT, ':utf8';
printf STDOUT "cαfe\n";
printf STDOUT "1234567890 café\n";
printf STDOUT "1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n";
UPDATE
No I was wrong, with the help of some of the guys at #perl on irc.perl.org it turns out to be a bug in the Microsoft API. WriteFile
is documented to return the number of bytes written but returns the number of characters written, which depends on the codepage. A bug was filed in March 2010.
There is more discussion in the MSDN forums.
UPDATE 2
I posted Michael Kaplan's blog, "Sorting it all out", about this problem and he responded with the article entitled "Hidden in plain site: a purloined letter kind of a bug report". He's a Microsoft internationalization expert so you will surely find some insights there...