I have a big piece of production code, that works. But after I setup a new environment in virtual machine I have one issue -- everytime I need to upload a binary file it become messed up with unicode conversions.
So there is a sub, where issue is:
sub save_uploaded_file
{
# $file is obtained by param(zip)
my ($file) = @_;
my ($fh, $fname) = tmpnam;
my ($br, $buffer);
# commenting out next 2 lines doesn't help either
binmode $file, ':raw';
binmode $fh, ':raw';
while ($br = sysread($file, $buffer, 16384))
{
syswrite($fh, $buffer, $br);
}
close $fh;
return $fname;
}
Its used to upload zip archives, but they are uploaded as malformed (their size is always bigger than in original) and I looked inside of them with hex editor and found that there are lots unicode replacement charaters, encoded in utf-8, inside (EF BF BD).
I figured out that the total sum of bytes read is bigger than original file. So the problem starts at sysread.
Text files uploads well.
Update: There is a binary representation of first few bytes of transfered file:
0000000: 504b 0304 1400 0000 0800 efbf bd1c efbf PK..............
0000010: bd3e efbf bd1d 3aef bfbd efbf bd02 0000 .>....:.........
0000020: efbf bd05 0000 0500 1c00 422e 786d 6c55 ..........B.xmlU
0000030: 5409 0003 5cef bfbd efbf bd4d 18ef bfbd T...\......M....
0000040: efbf bd4d 7578 0b00 0104 efbf bd03 0000 ...Mux..........
0000050: 0404 0000 00ef bfbd efbf bdef bfbd 6bef ..............k.
And the original one:
0000000: 504b 0304 1400 0000 0800 b81c d33e df1d PK...........>..
0000010: 3aa0 8102 0000 a405 0000 0500 1c00 422e :.............B.
0000020: 786d 6c55 5409 0003 5cd4 fc4d 18c7 fc4d xmlUT...\..M...M
0000030: 7578 0b00 0104 e803 0000 0404 0000 008d ux..............
0000040: 94df 6bdb 3010 c7df 03f9 1f0e e1bd 254e ..k.0.........%N
0000050: ec74 6c85 d825 2bac 9442 379a c25e ca8a .tl..%+..B7..^..
Update2 The running software is centos 5.6, perl 5.8.8, apache 2.2.3