0

We've got a system written in PHP which has grown pretty large over the years. The database is MySQL (InnoDB tables) with currently over 12GB of data, there's hundreds of tables, many of which have over 12 million records!

The problem is, a lot of the tables/columns (but not all) are in latin1, and we're (obviously) getting issues storing foreign characters.

What's the best way for us to convert all the tables/text columns into UTF8, with the shortest downtime possible?

The system is used by hundreds of people 24/7, so lengthy downtime is really not an option.

Is there any way of doing this successfully without a ton of downtime, and are there any obvious things we need to be careful of?

I know we'll need to set the following things to make our application use utf-8:

  • <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  • ini_set('default_charset', 'utf-8');
  • SET NAMES utf8
  • Files encoded to UTF8 in our IDE before uploading the file

I've read various other posts, but everyone seems to be suggesting different methods, some saying it needs a full database dump/restore... which is going to mean hours and hours of downtime.

So what's the best way to do this?

BT643
  • 3,495
  • 5
  • 34
  • 55
  • Did you try reading this question: http://stackoverflow.com/questions/2543993/convert-a-mysql-database-from-latin-to-utf-8 – Mladen B. Jun 10 '13 at 10:58
  • I was asking if there was any method of doing this without dumping and reimporting the entire database, though. A full restore is probably going to mean about 15+ hours of downtime I'm guessing. We have a backup database server which we replicate everything to. I'm looking for a better solution than 15+ hours of downtime, maybe some clever replication ideas which make our backup copy UTF8 so we can just switch etc. – BT643 Jun 10 '13 at 11:06
  • You can convert individual tables one at a time with `alter table tablename convert to charset utf8`. This should be reasonably fast for smaller tables so you can do it with the database running, so only the big tables with millions of records are a problem. – Joni Jun 10 '13 at 11:25
  • @Joni: 12GB is 12 *billion* bytes, so the average record size is about 1KB in that case. – Michael Madsen Jun 10 '13 at 11:28
  • @MichaelMadsen you are right of course, now it makes sense. – Joni Jun 10 '13 at 11:30

1 Answers1

0

You're hoping to do something that is possible, but very hard, and risky as well. Give up on clever: there is nothing magical that makes this easy. You're trading off downtime on the one hand against your labor cost and the risk of data loss on the other hand. Your labor cost will probably be ten times higher than it would if you took the 15 hours downtime.

Is it possible to write a SELECT query for every table that is guaranteed to retrieve every row that has been added or changed since a particular date/time, and do so quickly? If so, write this query for every table and keep it at hand. If not, you can't use this method.

You can do this table by table.

The small tables won't take much to do; you probably can do them while your application is live at off-peak hours. Just convert the columns.

If you have never-updated larger tables, you can create copies of those tables with the desired charset and default collation for the columns in question. You can then copy the data with INSERT ... SELECT. (http://dev.mysql.com/doc/refman/5.1/en/insert-select.html) Finally, with a few moments of downtime you can rename the production table, then give then new table the production table name. You may need to do this in chunks of a few thousand rows to keep InnoDB's transaction integrity system from blowing out your server's RAM.

Finally, you have to deal with your large and changing tables. Again, copy the tables with INSERT ... SELECT, again in a way where InnoDB transactions won't suspend your application's operation or blow out your RAM. The intention here is to have a snapshot of your table as of a certain date / time.

Then, shut down production. Use your handy-dandy query to select all inserted and changed rows since your start time, and insert / update them into your target table. Then rename your production table, and give your target table the production table's name, and restart production. You should be able to do this fairly quickly.

If I were you, I'd make a staging copy of your live production database and rehearse every step of this procedure before doing it live. You're going to encounter trouble with foreign-key constraints in InnoDB for example; you need to be able to work through those carefully.

Actually, if I were you I wouldn't try to do this. It's like replacing an airplane's engine while the airplane is flying. What could go wrong? :-) Instead I'd incur the scheduled web application downtime to get this conversion done. Even in that case you should try out the whole thing on a staging server before going live.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Ok, that makes sense, so if I go for the safer option and just take the downtime. What's the *correct* method for doing it? I'm just trying on our test server now to see roughly how long the downtime will be, using the method in this question: http://stackoverflow.com/questions/2543993/convert-a-mysql-database-from-latin-to-utf-8 But I'm getting some weird issues, it's taking *ages* (over 10 minutes so far) to drop a single table so it can re-import the data... with MySQL 5.5.29 on Ubuntu Server 12.04. – BT643 Jun 10 '13 at 14:45
  • Sounds to me like you should disable constraints (on your staging server). Try `SET FOREIGN_KEY_CHECKS = 0;` – O. Jones Jun 10 '13 at 19:00