4

I want to transfer a 3.23.49 MySQL database to a 5.0.51 MySQL database. Now I have exported the SQL file and I'm ready for import. I looked in the sql-file and Notepad++ shows me that the files is encoded in ANSI. I looked in the values and some of them are in ANSI and some of them are in UTF-8. What is the best way to proceed?

  1. Should I change the encoding within Notepad++?
  2. Should I use ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8;?
  3. Should I use iconv?
  4. Do I have to look through each table and make the necessary changes?
  5. Whate are the settings for the import? MYSQL323 compatibility mode and encoding latin1?
  6. Do I have to be aware of something if the php-scripts are using another encoding?

Thank you for your hints!

testing
  • 19,681
  • 50
  • 236
  • 417
  • What do you mean by `some of them are in ANSI and some of them are in UTF-8`? ANSI is a mode of `mysqldump` which has no relation to the encoding of the file, as soon as I know. – newtover Nov 17 '11 at 16:36
  • The file is according to Notepad++ encoded in ANSI. I have entries with umlauts (äöü...), which are displayed correctly. In another table I have a ü whis is represented as `ü` - a UTF-8 encoding. If I change the encoding to UTF-8 with Notepad++ the UTF-8 characters are OK but the ANSI one are like `xF6.`This is how the situation looks like. – testing Nov 17 '11 at 16:48
  • 1
    then you probably mean ascii, or better a single-byte encoding ) – newtover Nov 17 '11 at 16:52
  • Do these tables (columns really), where umlauts look different have different or the same encodings? If encodings are the same, then your data is pretty much corrupted and you'll need to selectively (n specific parts of it) use iconv to convert it to one encoding. – Mchl Nov 17 '11 at 16:57
  • The database has different tables. One table has than ASCII, another UTF-8. I know that the system has some encoding problems and now I want to solve it. But the same table has the same encoding. – testing Nov 17 '11 at 17:32

2 Answers2

5

If the problem is to import a utf8-encoded mysql dump, the solution is usually to add --default-character-set=utf8 to mysql options:

mysql --default-character-set=utf8 -Ddbname -uuser -p < dump.sql

UPD1: In case the dump file is corrupted, I would try to export the database once again table by table so that the dump would result in a correct utf8 encoded file.

newtover
  • 31,286
  • 11
  • 84
  • 89
  • I only have phpmyadmin to import the sql-dump. My problem is that there are two different encodings and I think they should be all merged to UTF-8. – testing Nov 17 '11 at 16:51
  • I tried to export one table (with ANSI encoding). Notepad++ shows me ANSI encoding. If I export one of the UTF-8 tables Notepad++ shows me UTF-8 without BOM. So should I look through each table and export table by table? Then should I change the encoding from ANSI to UTF-8 where necessary (with iconv?)? – testing Nov 18 '11 at 15:40
  • @testing, yes, I would do it that way. – newtover Nov 18 '11 at 15:50
3

I have converted a MySQL 4.0 database (which also had no notion of character encoding yet) to MySQL 5.0 four years ago, so BTDT.

But first of all, there is no "ANSI" character encoding; that is a misconception and a misnomer that has caught on from the early versions of Windows (there are ANSI escape sequences, but they have nothing to do with character encoding). You are most certainly looking at Windows‑1252-encoded text. You should convert that text to UTF‑8 as then you have the best chance of keeping all used characters intact (UTF‑8 is a Unicode encoding, and Unicode contains all characters that can be encoded with Windows-125x, but at different code points).

I had used both the iconv and recode programs (on the Debian GNU/Linux system that the MySQL server ran on) to convert Windows‑1252-encoded text of a MySQL export (created by phpMyAdmin) to UTF‑8. Use whatever program or combination of programs works best for you.

As to your questions:

  1. You can try, but it might not work. In particular, you might have trouble opening a large database dump with Notepad++ or another text editor.
  2. Depends. ALTER TABLE … CONVERT TO … does more than just converting encodings.
  3. See the paragraph above.
  4. Yes. You should set the character encoding of every table and every text field that you are importing data into, to utf8 (use whatever utf8_… collation fits your purpose or data best). ALTER TABLE … CONVERT TO … does that. (But see 2.)
  5. I don't think MYSQL323 matters here, as your export would contain only CREATE, INSERT and ALTER statements. But check the manual first (the "?" icon next to the setting in phpMyAdmin). latin1 means "Windows-1252" in MySQL 5.0, so that might work and you must skip the manual conversion of the import then.
  6. I don't think so; PHP is not yet Unicode-aware. What matters is how the data is processed by the PHP script. Usually the Content-Type header field for your generated text resources using that data should end with ; charset=UTF-8.

On an additional note, you should not be using MySQL 5.0.x anymore. The current stable version is MySQL 5.5.18. "Per the MySQL Support Lifecycle policy, active support for MySQL 5.0 ended on December 31, 2009. MySQL 5.0 is now in the Extended support phase." MySQL 5.0.0 Alpha having been released on 2003-12-22, Extended Support is expected to end 8 full years after that, on 2011‑12‑31 (this year).

PointedEars
  • 14,752
  • 4
  • 34
  • 33
  • @testing The source code in point no. 2 intentionally is a link to the corresponding section of the MySQL manual. I do not think I could (or should try to) explain it better. – PointedEars Jan 13 '12 at 09:52