0

I'm working on a big dataset encoded in UTF-16LE that holds 1 Billion records containing text strings in over 50 languages ( not all known to me).

I need to get these into our database MySql 5.7 using LOAD DATA INFILE(for import speed) but i just found out that MySql does not support UTF-16LEtext encoding while trying to load this using the workbench import too and also querying this data with Athena gives me no records back with this encoding.

  1. Best encoding relative to MySQl 5.7 that handles multi language and can LOAD DATA INFILE?
  2. Will this keep the text safe and not garble the text strings?
user15793580
  • 1
  • 1
  • 2
  • Does this answer your question? [failed to import utf16 encoded file into mysql](https://stackoverflow.com/questions/12447733/failed-to-import-utf16-encoded-file-into-mysql) – JosefZ May 02 '21 at 14:27
  • Thanks for reply, i saw that. In powershell i ran: Get-Content .\test.csv -Encoding Unicode | Set-Content -Encoding UTF8 .\test_encoded.csv then created a new database in workbench with utf8mb4 and utf8mb4_general_ci then a new table with utf8mb4. I go try import the test.csv file before and after the power shell utf8 encoding and i get "unhandled exception charmap codec can't decode byte 0x9d in position 6074, character maps to undefined".. – user15793580 May 02 '21 at 14:51

0 Answers0