0

I have Percona XtraDB 5.6 server with very old database with charset set to utf8 and data encoded in different charset (probably latin1).

I tried to migrate the database to new Percona 8.0 server but after importing the SQL file, all diacritic marks become broken on the 8.0 server. I have resolved the issue by executing this query on every column in every table:

UPDATE table SET col = convert(cast(convert(col using latin1) as binary) using UTF8) WHERE 1;

BUT there is one table with binary data (specifically GZIP compressed data) saved into LONGTEXT columns. Data from this columns always becomes damaged after import on the new server.

This is what I tried so far:

  • changing column type to LONGBLOB before dump.
  • using the above query to convert the data before/after column type change.

This is the command I'm using to export DB:

mysqldump --events --routines --triggers --add-drop-database --hex-blob --opt --skip-comments --single-transaction --skip-set-charset --default-character-set=utf8 --databases "%s" > db.sql

Please note the "--hex-blob" option which still results in binary data being exported as strings, instead of hex.

1 Answers1

0

It would not have been damaged by zip/unzip. But it could have been damaged in any of a number of other ways.

"--hex-blob" turns data into strings such that they will not be mangled until you reload them.

Dumping, loading, INSERTing, SELECTing all need to be told what character set to use.

The particular UPDATE you did may or may not have made things worse. Here is a list of cases that I have identified, and the minimal "fix":

http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases

Rick James
  • 135,179
  • 13
  • 127
  • 222