10

I use MySQL 5.1 and loaded from a UTF-8 decoded txt-file about 2.7 mil lines into a table which itself is declared as utf8_unicode_ci and as well all char-fields are declared as utf8_unicode_ci, using LOAD DATA INFILE...

In the database itself the characters all seem to be correct, everything looks nice. However, when I print them using php, the characters show up as ???, although I use utf-8 declaration in the HTML head:

<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
...

In another table (using utf-8), where I inserted text from a submitted form, the characters appear strangely in the database, but are shown correctly again, when I print them using SELECT....

So, I was wondering: what is wrong? Are UTF-8 chars shown correctly in the database or strangely but when you SELECT them again they are OK? Or where is the problem (when loading the file into the db, in the HTML or somewhere in between)??

Thank you very much for any hint or suggestion! :)

Chris
  • 3,756
  • 7
  • 35
  • 54
  • 1
    Are you issuing the `SET NAMES utf8` command before running MySQL queries? Are you sure your page is actually rendered as utf-8 (if there's an HTTP header `Content-Type: iso-8859-1`, browsers disagree about which should win)? – DCoder Apr 28 '12 at 20:15
  • I also suggest you to read http://www.joelonsoftware.com/articles/Unicode.html , make sure you use EVERYWHERE utf8 encoding, MySQL storage, MySQL connection, PHP itself, response header, ... – Styxxy Apr 28 '12 at 20:16
  • @DCoder: well, the problem is, that I loaded the data from a file and as I had problems with the correct path to the filename, I used the interface on phpmyadmin instead of a pure SQL command (LOAD DATA INFILE)... and what do you mean with the http-header? sorry, if this is a stupid question, but is it the same as the http-equiv="Content-Type"? – Chris Apr 28 '12 at 20:19
  • In your *PHP* code, when you set up the db connection, you need to execute `SET NAMES utf8` to tell MySQL you'll be sending and receiving data in UTF-8. As for the HTTP headers, it is possible your php/Apache is configured to send a [`Content-Type: text/html;charset=iso-8859-1`](http://lachy.id.au/log/2006/01/content-type) header with the response. If that happens, the browser can be left confused if it should render the page as iso-8859-1 or utf-8. – DCoder Apr 28 '12 at 20:24
  • @Styxxy: thx for the link to this interesting article! – Chris Apr 28 '12 at 20:30
  • @DCoder: so, in the database itself, the chars should be shown correctly or not? – Chris Apr 28 '12 at 20:30
  • Also check your editor character encoding. Whatever charset you declare in your html, it may not comply with what we you expect unless your editor's character encoding is the same. – Doğan AHMETCİ Apr 28 '12 at 20:31
  • @DoğanAHMETCİ: thx, yes as stated the text-file is utf-8 – Chris Apr 28 '12 at 20:34
  • @Chris I didn't mean the text file but the HTML file encoding. Since it seems your problem is not the text file but your PHP/HTML. – Doğan AHMETCİ Apr 28 '12 at 20:43
  • ok thank you all for your suggestions! can anyone post an answer to the question, if the chars should be displayed correctly in the db itself (meaning e.g. á is shown as á and not ö, À or sth similar and then being printed correctly? :) – Chris Apr 28 '12 at 20:48
  • You should read this: http://kunststube.net/frontback/ . If phpMyAdmin displays your entered data as correct Unicode text, then my bet is that you are not doing `SET NAMES utf8` after connecting. – DCoder Apr 29 '12 at 05:52
  • @DCoder: sorry very much for the very late reply... unfortunately could not return to this issue earlier, but wanted to let you know you were right, with SET NAMES utf8 it really displays all characters correctly... as you have not posted an answer, could you post an answer so I could accept it? thank you very much again! :) – Chris May 15 '12 at 14:59

2 Answers2

12

Note: MySQL's utf8 charset is limited, it only supports Unicode characters in the BMP that take up no more than three bytes. You should be using utf8mb4 instead.

If phpMyAdmin displays your entered data as correct Unicode text, then my bet is that you are not doing SET NAMES utf8 after connecting.

DCoder
  • 12,962
  • 4
  • 40
  • 62
10

Try use such code after connecting to DataBase, but befor you recieve data

$db->query('set character_set_client=utf8');
$db->query('set character_set_connection=utf8');
$db->query('set character_set_results=utf8');
$db->query('set character_set_server=utf8');
DaneSoul
  • 4,491
  • 3
  • 21
  • 37