0

I have a legacy database that claims to have collation set to windows-1252 and is storing a text field's contents as

I’d

When it is displayed in a legacy web-app it shows as I’d in the browser. The browser reports a page encoding of UTF-8. I can't figure out how that conversion has been done (almost certain it isn't via an on-the-fly search-and-replace). This is a problem for me because I am taking the text field (and many others like it) from the legacy database and into a new UTF-8 database. A new web app displays the text from the new database as

I’d

and I would like it to show it as I’d. I can't figure out how the legacy app could have achieved this (some fiddling in Ruby hasn't showed me a way to affect converting a string I’d to I’d).

I've tied myself in a knot here somewhere.

Ben
  • 1,321
  • 15
  • 30
  • 1
    Your dbase contains junk. Inserted by a program that ignored the encoding and used utf8. And as luck would have it, read by a program that ignored it as well. Not usually luck. It works 99% right. – Hans Passant Jan 12 '15 at 00:46

1 Answers1

1

It probably means the previous developer screwed up data insertion (or you're screwing up somewhere). The scenario goes like this:

  • the database connection is set to latin1
  • app actually sends UTF-8 to database
  • database interprets received data as latin1, stores it as such (interprets ’ as ’)
  • app queries for the data again
  • database returns ’ encoded in latin1
  • app interprets the data as UTF-8, resulting in ’

You essentially need to do the same misinterpretation to get good data. Right now you may be querying the database through a utf8 connection, so the database returns ’ encoded in UTF-8. What you need to do is query through a latin1 connection and interpret the data as UTF-8 instead.

See Handling Unicode Front To Back In A Web App for a more detailed explanation of all this.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • Yes that makes sense thanks @deceze. I'd like to do a one off conversion then. Don't want to touch the old database. – Ben Jan 12 '15 at 07:34