0

I use MySQL 5.7, but I do not know how to config it to display Vietnamese correctly.

I have set

CREATE DATABASE brt
    DEFAULT CHARACTER SET utf8 COLLATE utf8_vietnamese_ci;

After that I used "LOAD DATA LOCAL INFILE" to load data written by Vietnamese into the database.

But I often get a result with error in Vietnamese character display.

For the detailed codes and files, please check via my GitHub as the following link

https://github.com/fivermori/mysql

Please show me how to solve this. Thanks.

  • You must take a lot of parameters into account. Console/client codepage settings, possible client code pre-convertions, connection and server charset settings and so on... – Akina Aug 04 '21 at 06:39
  • may not relate to your problem, but avoid mysql's original crippled "utf8" character set, use "utfmb4" – ysth Aug 04 '21 at 06:45

1 Answers1

0

As @ysth suggests, using utf8mb4 will save you a world of trouble going forward. If you change your create statements to look like this, you should be good:

CREATE DATABASE `brt` DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
USE `brt`;

DROP TABLE IF EXISTS `fixedAssets`;
CREATE TABLE IF NOT EXISTS `fixedAssets` (
    `id`            int(11)        UNSIGNED NOT NULL    AUTO_INCREMENT,
    `code`          varchar(250)     UNIQUE NOT NULL    DEFAULT '',
    `name`          varchar(250)            NOT NULL    DEFAULT '',
    `type`          varchar(250)            NOT NULL    DEFAULT '',
    `createdDate`   timestamp               NOT NULL    DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE INDEX `idx_fa_main` ON `fixedAssets` (`code`);

I've tested this using the data that you provided and get the expected query results:

name
----------------------------------------------------------------
Mould Terminal box cover BN90/112 612536030 39 tháng
Mould W2206-045-9911-VN #3 ( 43 tháng)
Mould Flange BN90/B5 614260271 ( 43 tháng)
Mould 151*1237PH04pC11 ( 10 năm)
Transfer 24221 - 2112 ( sửa chữa nhà xưởng Space T 07-2016 ) BR2

Using the utf8mb4 character set and utf8mb4_unicode_ci collation is usually one of the simpler ways to ensure that your database can correctly display everything from plain ASCII to modern emoji and everything in between.

matigo
  • 1,321
  • 1
  • 6
  • 16
  • Thank you for your suggestion of using different character set, but I have found that the main reason for this bug may come from csv files. I have converted them into JSON files, and the Vietnamese sentences display correctly. – Fivermori Aug 12 '21 at 09:08