2

I am using Perl 5.20.2 and MySQL 5.5.57 on a Debian 8 machine. I recently discovered that MySQL's utf8 tables are limited to three-byte-characeters. As a consequence I can not store emojis. So, I tried utfmb4 tables which are supposed to address the issue. I changed the table from utf8 to utf8mb4 from inside the mysql client:

ALTER DATABASE `mydb` CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE `mydb`.`mytable` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE `mydb`.`mytable` CHANGE `object` `object` VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Storing data in mytable seems to work, at least I can see the expected emoji in phpMyAdmin. However, when reading from the table I receive a 4-character result with 3 unprintable charaters. The following program is supposed to print the same emoji twice:

#!/usr/bin/perl

use 5.10.1;
use warnings;
use strict;
use DBI;

binmode(STDOUT, ':utf8');

my $object = "\x{1F600}";
my $hd_db  = DBI->connect('DBI:mysql:mydb:localhost', 'user', 'password');
$hd_db->do('SET NAMES utf8mb4');

# cleanup
my $delete = $hd_db->prepare("DELETE FROM mytable");
$delete->execute;

my $insert = $hd_db->prepare("INSERT INTO mytable (object) VALUES ('" . $object . "')");
$insert->execute;
my $select = $hd_db->prepare("SELECT * FROM mytable");
$select->execute;
my $row    = $select->fetchrow_hashref;

say $object;
say $row->{'object'};

Expected output:



Actual output:


�

Seems like a bug to me. Any suggestion how to work around it?

EDIT: SELECTing the data from within the mysql client also shows the expected emoji

mysql> SET SESSION CHARACTER_SET_CLIENT = utf8mb4;
mysql> SET SESSION CHARACTER_SET_RESULTS = utf8mb4;
mysql> SELECT * FROM mytable;
+--------+
| object |
+--------+
|       |
+--------+
Marcus
  • 315
  • 1
  • 10

3 Answers3

4

You told MySQL to use UTF-8 for communication, but you also need to tell DBD::mysql to decode the data (or do it yourself).

You want

my $dbh = DBI->connect('DBI:mysql:mydb:localhost', 'user', 'password', {
   mysql_enable_utf8mb4 => 1,
})
   or die($DBI::errstr);

which is equivalent to

my $dbh  = DBI->connect('DBI:mysql:mydb:localhost', 'user', 'password')
   or die($DBI::errstr);

$dbh->do('SET NAMES utf8mb4')
   or die($dbh->errstr);

$dbh->{mysql_enable_utf8mb4} = 1;
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • 1
    I accepted this answer as it is the way to go for `DBI` versions >= 4.041_01. Debian 8 ships with 3.0.17. For that version decoding works when opting for `mysql_enable_utf8 => 1`, see [this post](http://blogs.perl.org/users/mike_b/2016/12/dbdmysql-all-your-utf-8-bugs-are-belong-to-us.html) – Marcus Oct 14 '17 at 11:52
1

The workaround is to let MySQL treat everything as bytes and to do the encoding in your application.

use Encode qw(encode decode);

my $object = "\x{1F600}";
my $hd_db  = DBI->connect('DBI:mysql:mydb:localhost', 'user', 'password');
$hd_db->do('SET NAMES latin1');

...

my $insert = $hd_db->prepare("INSERT INTO mytable (object) VALUES ('" . 
    encode("UTF-8",$object) . "')"); # or equiv statement with placeholders
$insert->execute;

...

my $select = $hd_db->prepare("SELECT * FROM mytable");
$select->execute;
my $row    = $select->fetchrow_hashref;
say $object;
say decode("UTF-8",$row->{'object'});
mob
  • 117,087
  • 18
  • 149
  • 283
  • Thanks for the suggestion, but unfortunately I would have to revisit more than 1k db queries in my application. And worse, they would have to be tested.. – Marcus Oct 13 '17 at 13:46
0

"\x{1F600}"; is "Unicode", not "utf8". They are related, but they are not the same encoding.

You need UTF-8 (as the non-mysql world calls it) and utf8mb4 (as MySQL calls it).

is hex F09F9880 (in utf8mb4); it is 😀 if you convert through CHARACTER SET latin1 ("Mojobake")

Please run SELECT HEX(object) ... to see whether you get those 4 hex bytes or something else. Then we will know whether to focus on the INSERT or the SELECT.

You say "actual output" -- but where is this? A web page? Is it configured for UTF-8? Or something else? If it is your commandline window, then make sure it is set for UTF-8. In windows, that is done via chcp 65001 .

You mentioned

mysql> SET SESSION CHARACTER_SET_CLIENT = utf8mb4;
mysql> SET SESSION CHARACTER_SET_RESULTS = utf8mb4;

That's only 2 of the 3 that need to be set. Better to simply do

SET NAMES utf8mb4;
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • It was console output and it worked out of the box with both Ubuntu and W10/Putty 0.7. Win7/Putty 0.7 does not work out of the box, though I have not tried `chcp`ing – Marcus Oct 14 '17 at 11:54
  • Compare `my.cnf`. You may find different defaults. What versions _of MySQL_? – Rick James Oct 14 '17 at 14:52