0

I need to import files extracted created on a Windows platform into a MySQL 5.5 database created and running on a Centos server. I develop the scripts on my OSX 10.9 laptop running a MySQL 5.5 database server. The files contain Dutch texts, so they can have accented letters in the names of people.

I need a reliable way to confirm the import is correct on all platforms.

First attempt:

  • bash shell script (loadFiles.sh) created that drops database, recreates the db with the provided schema dump and loads the files using LOAD DATA LOCAL INFILE
  • db schema is created using latin1 encoding on OSX
  • files are imported as provided
  • accented names look good in Sequel Pro on OSX
  • running the loadFiles.sh script on the Centos server -> names are garbled
  • dumping the OSX version of the database and importing on the Centos server -> names are good

Second attempt:

  • db schema is changed to utf8
  • loadFiles.sh is expanded to test and convert the files using:

`

file $FILE | grep ISO-8859 >> /dev/null
NEEDCONVERSION=$(echo $1)

if [ "$NEEDCONVERSION == "0"]; then
  iconv -f LATIN1 -t UTF8 $FILE > $CONVERTEDFILE    
fi
  • converted files are imported
  • running the script on the Centos server -> all names look good
  • inspecting the files on the commandline using less in Centos -> names look fine
  • running the script on OSX -> names are garbled
  • inspecting the files on the commandline using less in OSX -> names look fine

I've tried:

  • different encoding with iconv (UTF8-MAC vs UTF8) -> the only difference is that the way the names are garbled is different
  • I followed the advise in this question (adding --default-character-set=utf8 to the MySQL import statement and making sure the character collation is correct), but it didn't solve my problem.

Update: when I change the way Sequel Pro views the data on OSX (basically it says the commands below), the names look fine again. I don't need to do this when viewing the data on the Centos server using Sequel Pro.

set character_set_result=latin1;set character_set_client=latin1;

Any help would be greatly appreciated.

Community
  • 1
  • 1
hepabolu
  • 1,214
  • 4
  • 18
  • 29

1 Answers1

0

Since I could finish the job with the described workaround I didn't investigate further. But it turns out that the default setting for the database is 'latin1'. That might be the clue.

i.e.: character_set_database is set to latin1

hepabolu
  • 1,214
  • 4
  • 18
  • 29