2

When attempting to insert (for example, which is a 4-byte unicode char), both MySQL (5.7) and MariaDB (10.2/10.3/10.4) give the same error:

Incorrect string value: '\xF0\x9F\x92\xA9'

The statement:

mysql> insert into bob (test) values ('');

Here's my database's charset/collation:

mysql> select @@collation_database;                                                                     +----------------------+
| @@collation_database |
+----------------------+
| utf8mb4_unicode_ci   |
+----------------------+
1 row in set (0.00 sec)

mysql> SELECT @@character_set_database;                                                                 +--------------------------+
| @@character_set_database |
+--------------------------+
| utf8mb4                  |
+--------------------------+
1 row in set (0.00 sec)

The server's character set:

mysql> show global variables like '%character_set_server%'\G;                                           *************************** 1. row ***************************
Variable_name: character_set_server
        Value: utf8mb4

The table:

create table bob ( `test` TEXT NOT NULL );
mysql> SHOW FULL COLUMNS FROM bob;
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation          | Null | Key | Default | Extra | Privileges                      | Comment |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| test  | text | utf8mb4_unicode_ci | NO   |     | NULL    |       | select,insert,update,references |         |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)

Can anyone point me in the right direction?

Danny Kopping
  • 4,862
  • 2
  • 29
  • 38
  • The only way to make this work is to execute `SET NAMES utf8mb4` before my `INSERT`. Is this the *only* way? – Danny Kopping Apr 06 '19 at 22:06
  • BTW—A codepoint is 21 bits in the Unicode character set. It is Unicode's UTF-8 character encoding that uses 4 8-bit code units for some codepoints. "4-byte Unicode char" sounds strange and is ambiguous with two UTF-16 code units and 1 UTF-32 code unit. – Tom Blodget Apr 07 '19 at 15:02
  • @infomaniac - Usually the 'better' way is available in the "connection parameters" that the client uses. (Each client does that a different way.) `SET NAMES` is a fallback that works if you don't have (or don't use) the connection parameters. – Rick James Apr 16 '19 at 18:31

1 Answers1

3

Yes, as you commented, you need to use SET NAMES utf8mb4.

Your 4-byte character must pass from your client through the database connection and into a table. All of those must support utf8mb4. If any one of them does not support utf8mb4, then 4-byte characters will not be able to get through.

SET NAMES utf8mb4 makes the database session expect clients to send string using that encoding. The default for character_set_client on MySQL 5.7 is utf8, so you need to set it to utf8mb4.

In MySQL 8.0.1 and later, the default character_set_client is utf8mb4 already, so you won't need to change it.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828