MySQL Character Set and Collate

Question

I use MySQL 5.7, but I do not know how to config it to display Vietnamese correctly.

I have set

CREATE DATABASE brt
    DEFAULT CHARACTER SET utf8 COLLATE utf8_vietnamese_ci;

After that I used "LOAD DATA LOCAL INFILE" to load data written by Vietnamese into the database.

But I often get a result with error in Vietnamese character display.

For the detailed codes and files, please check via my GitHub as the following link

https://github.com/fivermori/mysql

Please show me how to solve this. Thanks.

You must take a lot of parameters into account. Console/client codepage settings, possible client code pre-convertions, connection and server charset settings and so on... — Akina, Aug 04 '21 at 06:39
may not relate to your problem, but avoid mysql's original crippled "utf8" character set, use "utfmb4" — ysth, Aug 04 '21 at 06:45

score 0 · Answer 1 · answered Aug 04 '21 at 08:32

As @ysth suggests, using utf8mb4 will save you a world of trouble going forward. If you change your create statements to look like this, you should be good:

CREATE DATABASE `brt` DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
USE `brt`;

DROP TABLE IF EXISTS `fixedAssets`;
CREATE TABLE IF NOT EXISTS `fixedAssets` (
    `id`            int(11)        UNSIGNED NOT NULL    AUTO_INCREMENT,
    `code`          varchar(250)     UNIQUE NOT NULL    DEFAULT '',
    `name`          varchar(250)            NOT NULL    DEFAULT '',
    `type`          varchar(250)            NOT NULL    DEFAULT '',
    `createdDate`   timestamp               NOT NULL    DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE INDEX `idx_fa_main` ON `fixedAssets` (`code`);

I've tested this using the data that you provided and get the expected query results:

name
----------------------------------------------------------------
Mould Terminal box cover BN90/112 612536030 39 tháng
Mould W2206-045-9911-VN #3 ( 43 tháng)
Mould Flange BN90/B5 614260271 ( 43 tháng)
Mould 151*1237PH04pC11 ( 10 năm)
Transfer 24221 - 2112 ( sửa chữa nhà xưởng Space T 07-2016 ) BR2

Using the utf8mb4 character set and utf8mb4_unicode_ci collation is usually one of the simpler ways to ensure that your database can correctly display everything from plain ASCII to modern emoji and everything in between.

Thank you for your suggestion of using different character set, but I have found that the main reason for this bug may come from csv files. I have converted them into JSON files, and the Vietnamese sentences display correctly. — Fivermori, Aug 12 '21 at 09:08

MySQL Character Set and Collate

1 Answers1