How to change encoding on fly in SELECT statement?

Question

I have a table with a column, which has cp1251_general_ci collation. I don't want to change column collation, but I want to get data in utf8 encoding.

Is there a way to select any data somehow in a way that it looks just like a data with utf8_general_ci collation?

I.e. I need something like this

SELECT CONVERT_TO_UTF8(weirdColumn) FROM weirdTable

"ЂЃЉЌ" is Cyrillic. It is hex 80818A8D in cp1251; hex D082D083D089D08C in utf8 or utf8mb4. Why change the encoding? That is, what will the next action be? Many operations can be done without _explicitly_ asking to convert the encoding.33 — Rick James, Oct 29 '21 at 21:02

Bill Karwin · Accepted Answer · 2021-09-26T21:34:45.083

Here's a demo table using the cp1251 encoding. I'll insert some Cyrillic characters into it.

mysql> CREATE TABLE weirdTable (weirdColumn text) ENGINE=InnoDB DEFAULT CHARSET=cp1251;

mysql> insert into weirdTable values ('ЂЃЉЌ');

mysql> select * from weirdTable;
+-------------+
| weirdColumn |
+-------------+
| ЂЃЉЌ        |
+-------------+

Use MySQL's CONVERT() function to force the characters to a different encoding:

mysql> select convert(weirdColumn using utf8) as weirdColumnUtf8 from weirdTable;
+-----------------+
| weirdColumnUtf8 |
+-----------------+
| ЂЃЉЌ            |
+-----------------+

Here's proof that the result has been converted to utf8. I create a table using metadata from the query result:

mysql> create table w2 
    as select convert(weirdColumn using utf8) as weirdColumnUtf8 from weirdTable;
Query OK, 1 row affected (0.07 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> show create table w2\G
*************************** 1. row ***************************
       Table: w2
Create Table: CREATE TABLE `w2` (
  `weirdColumnUtf8` longtext CHARACTER SET utf8
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

mysql> select * from w2;
+-----------------+
| weirdColumnUtf8 |
+-----------------+
| ЂЃЉЌ            |
+-----------------+

On my MySQL instance, utf8mb4 is the default character encoding. That's okay; it's a superset of utf8, and the utf8 encoding is enough to store these characters. However, I generally recommend if you use utf8, there's no reason not to use utf8mb4.

If you change the character encoding, you cannot keep the cp1251 collation. Collations are specific to encodings. But you can use one of the collations associated with utf8 or utf8mb4. You can see the available collations for a given character encoding:

mysql> SHOW COLLATION WHERE Charset = 'utf8';
+--------------------------+---------+-----+---------+----------+---------+---------------+
| Collation                | Charset | Id  | Default | Compiled | Sortlen | Pad_attribute |
+--------------------------+---------+-----+---------+----------+---------+---------------+
...
| utf8_general_ci          | utf8    |  33 | Yes     | Yes      |       1 | PAD SPACE     |
| utf8_general_mysql500_ci | utf8    | 223 |         | Yes      |       1 | PAD SPACE     |
...

To get the desired collation together with encoding conversion one can use: `SELECT CONVERT(weirdColumn using utf8) COLLATE utf8_general_ci FROM weirdTable` — klm123, Sep 27 '21 at 08:18
utf8_general_ci is the default collation for utf8, so that's not necessary. But yes, that's the right syntax. — Bill Karwin, Sep 27 '21 at 13:54

How to change encoding on fly in SELECT statement?

1 Answers1