4

I'm trying to move a MySQL 3.23.58 database to a different server running 5.5.19.

The old one has latin1 encoding specified, and as far as I can tell the underlying data is indeed honestly latin1. I have tried many things, chiefly:

  • exporting from terminal with mysqldump and the latin1 encoding flag.
  • editing in vim to change "TYPE=InnoDB" to "ENGINE=InnoDB" for MySQL 5 compatibility.
  • importing to the new server from terminal.

Browsing the old server (in Sequel Pro for Mac, or MySQL Query Browser on PC), special characters don't always show properly, but they're there (looking at the binary in hex). (And in any case it works with the PHP web app.)

Browsing the new server, all special characters appear to have been replaced by question marks. I know that sometimes special characters can display as a question mark (or �) if the wrong encoding is specified. But these appear to be genuine straight-up encoded ASCII question marks on a binary level. The special characters (chiefly curly quotation marks and dashes) appear to have been lost, or destroyed, in the export/import.

Any idea why?

I know there are many things that can go wrong with encoding, with many different things at fault. I have been reading about this for several days (here and elsewhere) and tried setting all the right character encodings, tried UTF-8, tried casting and converting, tried Sequel Pro's export/import (as opposed to the terminal), etc. But I am stumped.

Toph
  • 2,561
  • 2
  • 24
  • 28
  • If you export as SQL statements do you see the same issue? From your question it sounds like the exported file is OK (you've looked in hex editor), but it's the import that's creating the problem. I fail to see why an SQL INSERT statement would fail if it is a plain text file on disk and all chars are presented in UTF-8 or latin1. Try it with one record you are having trouble with. – Brad Jun 25 '12 at 08:14
  • This is what an apostrophe (or right single quotation mark) looks like in the exported file in vim (screenshot): http://cl.ly/1C2m0d1M2y0g1J1C3d0P -- a <92>. Is that some kind of vim digraph? (Quadgraph?) It doesn't match anything here: http://vimdoc.sourceforge.net/htmldoc/digraph.html#digraph-table – Toph Jun 26 '12 at 08:42
  • And an em dash shows up as <97>. – Toph Jun 26 '12 at 08:50
  • Does MySQL 3.23.58 have `character_set_client`, and if so, what is it set to when you're viewing the data with Sequel Pro or MySQL Query Browser? What is the encoding set to in the client program? When you import the data, you're doing `mysql --default-character-set=latin1 ...`? What is the character encoding of the database you're importing into and of the table that you're viewing when you see the `?` chars? – JMM Jun 28 '12 at 01:39

1 Answers1

2

Good, it looks like we've narrowed down your problem. I found this post

If your text editor is vim, then most likely the "<92>" is the hexadecimal code of an extended ASCII character. In this case, it is Hex(92) or Oct(222) or Dec(146) , which is "right single quotation mark"; not to confused with "single quote" which is ASCII Dec code 39.

One way to remove all non-ASCII characters from your file could be -

perl -plne 's/[^[:ascii:]]//g' <your_file>

Otherwise just search and replace "<92>" and "<97>" in your exported file with an appropriate character.

[Edit]

I'm not a VIM user but this post addresses the issue of replacing the <92> smart quote characters

For each value that you see in your file, just do a string substitution, like so:

:%s/<93>/\’/g

of course, you can’t just type that <93> in there, so to get it in there you use

CTRL-V x 93

which inserts hex 93 in place.

In recently exported CSV’s from excel, I’ve seen hex 91-97.

Brad
  • 15,186
  • 11
  • 60
  • 74
  • Thanks for your help Brad. Unfortunately, removing all non-ASCII characters isn't an option. Search and replace could work, but I haven't figured out how it works with extended ASCII characters in vim. It can't find them if I just type <92> etc. – Toph Jun 28 '12 at 09:49
  • Comment on [Edit]: Ah, excellent, thanks so much! I also had to set file encoding in vim to utf8 for it to save (maybe that had been a problem all along? I thought I'd taken care of it) and make sure to SET NAMES 'utf8' for mysql, but that fixed the crux of it. Lookin' good. – Toph Jun 29 '12 at 21:14