5

I am trying to move some data from MSSQL to MySQL. When I'm running wbcopytables.exe the charset on mysql connection seems to be wrong, I'm getting an error when the data contain emoji icons (http://apps.timwhitlock.info/unicode/inspect?s=%F0%9F%8C%A8%E2%98%83%EF%B8%8F):

Incorrect string value: '\xF0\x9F\x8C\xA8\xE2\x98...' for column 'Value' at row 4

My server, database, table and column have charset=utf8mb4 and collation=utf8mb4_unicode_ci. However I can insert those emoji icons to that table with a .NET application I created on the side, so it is not a matter of schema/server settings. This makes me think wbcopytables.exe is enforcing some other encoding (possibly utf8) on the connection. I tried changing all mysql variables to enforce utf8mb4 as suggested in other SO questions with these in my.ini:

[mysqld]
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
init_connect='SET collation_connection = utf8mb4_unicode_ci'
skip-character-set-client-handshake

EDIT: Here is more information about the wbcopytables.exe itself: https://dev.mysql.com/doc/workbench/en/wbcopytables.html

andy250
  • 19,284
  • 2
  • 11
  • 26
  • Who owns `wbcopytables`? Where's the docs for it? – Rick James Feb 04 '17 at 02:04
  • I have updated the question. `wbcopytables.exe` is a CMD utility shipped with MySQL Workbench. – andy250 Feb 06 '17 at 08:18
  • Is the .NET application the only evidence you have that source data is correctly encoded? Are you at least using `NVARCHAR` columns and setting SQL Server connection encoding as Unicode? – Álvaro González Feb 06 '17 at 09:00
  • Yes it is the only evidence. MSSQL has nvarchar columns. Why encoding on mssql connection would matter? In the error message I can see I am getting the bytes that represent the emoji. I can do insert of the same byte sequence from .NET. The problem lies in the mysql insert. – andy250 Feb 06 '17 at 09:20
  • Why should source data integrity matter? Because migration tools are normally not written to figure out how to fix bad data. I'm not stating that data is wrong but, given that it isn't working, it thought it'd be worth to make sure and not just guess. – Álvaro González Feb 06 '17 at 09:51
  • Does the `--force-utf8-for-source` parameter make any change? – Álvaro González Feb 06 '17 at 09:56
  • Did you ever figure this out? I am having the same problem (though with SQL Anywhere, not MS SQL, but the dialects are similar). – Erica Kane Sep 07 '17 at 19:16
  • It's been a while, I think I abandoned wpcopytables altogether and implemented a custom script in C#. – andy250 Sep 08 '17 at 11:20
  • https://stackoverflow.com/questions/46866272/mysql-migration-wizard – tsu1980 Dec 25 '17 at 06:59
  • see bug report: https://bugs.mysql.com/bug.php?id=87593 bug is verified but no fix yet – user10209715 Aug 10 '18 at 17:16

1 Answers1

0

I had a similar problem, it turned out that my source database had latin1 encoding, but the application had allowed in UTF8 data. Needless to say this caused problems. I had to export the data from the source database into a CSV and then use LOAD DATA LOCAL INFILE instead with the character encoding specified. E.g.

load data local infile 
'C:\\SentryLink Search\\Git\\WebApp\\sql\\data_subject_address_clean.csv'
replace into table subject_address 
character set latin1
fields terminated by ',' OPTIONALLY ENCLOSED BY '\''
escaped by '|'
lines terminated by '\n';

Note that I used the pipe character | as the escape character as it is less likely to appear in data, that said, every now and then a row would have |' so as a prelimimary step one had to get rid of those rows, or edit them, using grep or similar command.

I can't say it is fun, but it works.

Erica Kane
  • 3,137
  • 26
  • 36