0

I've recently converted a MySQL database over from latin1 to utf-8. All has gone well apart from a blob column. In an example, the field contains erroneous characters displayed as a 'question mark in a diamond'.

If I take this data from the blob, run it through PHP's utf8_encode() it displays fine. Furthermore, if I then save the data from utf8_encode() into the blob field, I no longer need to uf8_encode() the data to display it.

How can I convert the blob data without the need of PHP intervention?

Update

I tried copying the BLOB over to a TEXT field, but the error still exists.

To assist, here is example data from the BLOB:

HEX:

613A313A7B733A373A22636F6E74656E74223B733A3333333A223C70207374796C653D22746578742D616C69676E3A2063656E7465723B223E0A093C7370616E207374796C653D22636F6C6F723A20726762283235352C203235352C20323535293B223E3C7370616E207374796C653D22666F6E742D73697A653A20323470783B223E4672696C616E736F707064726167202D2042696C646561726B69763C2F7370616E3E3C2F7370616E3E3C2F703E0A3C70207374796C653D22746578742D616C69676E3A2063656E7465723B223E0A093C62723E0A093C7370616E207374796C653D22666F6E742D73697A653A20323070783B223E3C7370616E207374796C653D22636F6C6F723A20726762283231312C203231312C20323131293B223E496E666F726D61736A6F6E202D20446F6B756D656E7461736A6F6E202D20507265737365202D20466F727374F87272656C7365723C2F7370616E3E3C2F7370616E3E3C2F703E0A223B7D

Expected value:

a:1:{s:7:"content";s:333:"<p style="text-align: center;">
    <span style="color: rgb(255, 255, 255);"><span style="font-size: 24px;">Frilansoppdrag - Bildearkiv</span></span></p>
<p style="text-align: center;">
    <br>
    <span style="font-size: 20px;"><span style="color: rgb(211, 211, 211);">Informasjon - Dokumentasjon - Presse - Forstørrelser</span></span></p>
";}

In this example, I end up with the following text displayed:

Frilansoppdrag - Bildearkiv

Informasjon - Dokumentasjon - Presse - Forst�rrelser

Using a few PHP functions, I'm being told the data is UTF8, the page I'm displaying in is also UTF8. So it's a bit confusion why I'm getting the errors for the special char. Furthermore, If I change the page encoding from UTF-8 to ISO-8859-15, the characters display fine.

halfer
  • 19,824
  • 17
  • 99
  • 186
David
  • 16,246
  • 34
  • 103
  • 162
  • An off topic side question: what are the reasons to save text in a blob field and not into text one (see http://stackoverflow.com/questions/7071662/mysql-text-vs-blob-vs-clob )? – Reporter Sep 16 '16 at 12:20
  • 1
    The whole point of BLOB is that it doesn't have a character set associated with it.. I agree with @reporter - what's the point in using a BLOB if you **need** charset with it? Convert the blob into text and be done with the problem. That's how you do it properly. – Mjh Sep 16 '16 at 12:31
  • @Mjh I've just done a strait data type switch to text and half the data is now truncated. So there is clearly more to simply switching it from blob to text. – David Sep 16 '16 at 12:42
  • Alter table, add the `text` column, run `UPDATE mytable SET new_text_column = my_blob_column;`. I never suggested to swap from one data type to another by issuing a conversion, that always comes with truncation.. pour from one bucket to another, don't swap out the bucket :) Also, if I may know - how come you used a BLOB type if it wasn't required? Could there be another problem that can arise from this data type swap? – Mjh Sep 16 '16 at 12:48
  • @Mjh No worries, I have a back up:) I think it was done due to special characters, but I can't be sure. A previous developer's idea. – David Sep 16 '16 at 12:58
  • @David Is it possible to add a column with text datatxpe and convert the content from blob column with php? After that change the code so the program will write any text into the blob field anymore – Reporter Sep 16 '16 at 13:53
  • @reporter Yes, that could be done. But i'm looking for something that doesn't involve PHP's intervention – David Sep 19 '16 at 11:57
  • @David what is problem to use php a last time and then change the source code from your application (as I suggested)? – Reporter Sep 19 '16 at 12:13
  • @reporter I'm currently running the process on a small database and documenting the process in which I will then use on a rather large database (100s of tables with millions of rows each). It's important to keep the process as quick as possible and thus I want to remove the need for PHP processing time. While, yes, PHP would be the easy solution, it's not the pure SQL (or even Linux) process that i'm looking for. – David Sep 19 '16 at 12:16

1 Answers1

0

You can't convert a BLOB to TEXT without understanding the encoding in the blob. And if the 'text' in the BLOB is encoded in a variety of ways, it is not possible to convert to TEXT without losing some of the text.

Look at the HEX() of the blob. If you have, for example, é, and the HEX comes out as E9, then the text was probably encoded as latin1. If, instead, you get C3A9, then you probably have utf8 (or utf8mb4). Without doing this analysis first, you risk destroying (via truncation or turning into question marks) the "text" in your BLOB. That will lose data!

The alter looks like:

ALTER TABLE t MODIFY c TEXT CHARACTER SET latin1  NOT NULL;  -- if E9
ALTER TABLE t MODIFY c TEXT CHARACTER SET utf8mb4 NOT NULL;  -- if C3A9

Caution:

  • Be sure to have a backup; if something goes wrong; data could be lost.
  • Change t and c to your table and column names.
  • Add on NULL or NOT NULL to match the current declaration.
  • Carry forward any other things on the declaration.
  • If my E9/C3A9 example is not clear enough, let's see some HEX for further discussion (and include what you think the text should be.)
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • The `F9` is latin1 (or latin5 or dec8) hex for `ø`. I take it that you are expecting the last word to be `Forstørrelser`? So, probably my first `ALTER` is approximately what you need. – Rick James Sep 20 '16 at 05:18