0

I have a MySQL database with an InnoDB table containning utf8_general_ci varchar fields. When I fetch them through PHP (via PEAR::MDB2) and try to output them (via Smarty), I get ??? symbols. I would like to know how to fix that problem, which is most likely caused by PHP.

Good information to know:

  • It is a new version of the site I'm working on, the old version had the same problem even though it didn't use Smarty nor MDB2, so they are most likely not the cause. The old programmer used htmlentities() to remedy the problem, but I'm trying to avoid that.
  • The character encoding of all my files (template, source, etc.) is UTF-8 without BOM.
  • When I display a page, all accented characters (the ones in the templates, not the ones coming from MySQL) are shown correctly and the encoding in the browser is UTF-8. If I manually switch it over to ISO-8859-1, then the character from MySQL are outputed correctly, but no the others.

Basically, it seems that PHP or MySQL transforms the UTF-8 data contained within the database to ISO-8859-1 at some point during the query/fetch process, and that is what I want to fix.

I've done a lot of searching but haven't found any solution, and I'm hoping the problem lies in a setting somewhere. I'd like to avoid having to use htmlentities() or utf8_encode(), however that might be the only way to go until PHP6 shows up.

Thank you for your input on this!

Mathieu M-Gosselin
  • 1,225
  • 1
  • 13
  • 17

2 Answers2

1

You need to execute a few queries to tell it to use UTF-8 for the connection (the default is indeed Latin-1). Here's what I use:

SET CHARACTER SET = "utf8";
SET character_set_database = "utf8";
SET character_set_connection = "utf8";
SET character_set_server = "utf8";

I know some of these seem overkill, but they have been tested and do seem to work quite well...

ircmaxell
  • 163,128
  • 34
  • 264
  • 314
  • "i do not know what does this magical spell mean but when I chant it, it always rains on my crops". very intelligent – Your Common Sense Jan 06 '11 at 15:41
  • 1
    @Col: No, but I have experimented with different versions and different settings, and this appears to be the only combination that works in 100% of the cases. `SET NAMES` doesn't actually change the connection character set, so it can lead to issues. If you don't like answers from experience, that's fine. Show why it's wrong or why I should do something else. Otherwise, just don't bother... – ircmaxell Jan 06 '11 at 15:42
  • none of your chants do actually change connection character set as well – Your Common Sense Jan 06 '11 at 15:47
  • @Col: One edge-case example that `SET NAMES` doesn't work for is `LOAD DATA INFILE`, which will use the `character_set_database` setting (which is defaulted to `latin1`). So yes, there is reason to do more than just `SET NAMES`. And what I posted does actually change the connection character set (But you are correct, the extra `charset_connection` call is redundant)... – ircmaxell Jan 06 '11 at 15:55
  • what does `mysql_client_encoding()` say after all these manipulations? – Your Common Sense Jan 06 '11 at 15:57
  • @Col: it says `latin1`, just the same as it says if you use `SET NAMES`. It's a known issue with PHP that the only way to change the return value of that function is by calling `mysql_set_charset()`. See [this comment in the docs](http://www.php.net/manual/en/function.mysql-client-encoding.php#76668). The better test would be to test the return of the query: `SHOW VARAIBLES LIKE 'character_set%'` – ircmaxell Jan 06 '11 at 16:07
  • Thanks! For now I'll use SET NAMES 'utf8' as it seems to work fine and I don't think I'll use LOAD DATA INFILE anyway. This adds one extra query on every page where I use the DB, but I figure this won't be a performance issue. – Mathieu M-Gosselin Jan 06 '11 at 18:06
0

My guess is the data wasn't utf-8-encoded when it hit the database.

Oswald
  • 31,254
  • 3
  • 43
  • 68