2

I have been struggling with encoding problems in MySQL for a while. I am building a database that will contain not only Latin but Cyrillic and Arabic text as well. So here is an example on how I create the database:

CREATE DATABASE db1
DEFAULT CHARACTER SET utf8   
COLLATE utf8_unicode_ci;

Then a table:

CREATE TABLE TempTb1
(
ID INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
arabic VARCHAR(100) NOT NULL
)
DEFAULT CHARACTER SET utf8   
COLLATE utf8_unicode_ci;

And when I put some data and select it I get only some strange characters. So I wrote a small PHP script to test it but it doesn't work either:

<?php
header('Content-type: text/plain; charset=utf-8');

$a = mysql_connect('localhost','root','') or die('Problem connecting to database!');
$b = mysql_select_db('db1') or die('Problem selecting database');
mysql_set_charset('utf8');
mysql_query("set names 'utf8'");
mysql_query('set character set utf8');

$query = mysql_query("SELECT * FROM Tb1;");

while($row = mysql_fetch_assoc($query))
    {   
        $id = $row['ID'];
        $name = $row['name'];
        $arabic = $row['arabic'];

        echo $id.' '.$name.' '.$arabic.PHP_EOL;
    }
?>

I have tested with both utf8_unicode_ci and utf8_general_ci. What could be wrong? BTW I have EasyPHP 5.2.10.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
RegedUser00x
  • 2,313
  • 5
  • 27
  • 34
  • i think you should use `ISO5` – mgraph Jun 28 '12 at 22:48
  • 2
    Not sure about the char encoding, but it looks like you're using the old MySQL APIs, it's recommended you use the newer PDO API. – Drizzt321 Jun 28 '12 at 22:49
  • 1
    collation is not the encoding, just the sort order. So you can ignore it for the start and concentrate on the actual encoding. Then if you run a mysql query, you should check the return value and do error reporting if things go wrong. Then `SET NAMES` is discouraged. And the `mysql_*` functions are discouraged as well, please to not use them for new code. – hakre Jun 28 '12 at 22:53
  • The testing code looks good and tidy, proper utf-8. If the `SHOW FULL COLUMNS FROM Tb1;` does not show any character set... I'd say non-utf-8 data is crammed into utf-8 columns (which can be rectified, when you know the character set by `CONVERT`ing the table in order to `bin -> actual charset -> utf8`. – Wrikken Jun 28 '12 at 22:53
  • Why do you think the problem is with Mysql, and not PHP? – Zoredache Jun 28 '12 at 22:58
  • Are you 100% sure the data in DB is UTF-8? It's not enough to read it as UTF-8 if it wasn't written as UTF-8. – c2h5oh Jun 28 '12 at 23:14

2 Answers2

1

Whatever happens to your characters, happens before they reach to MySQL, I guess. Characters are converted to numbers by the computer when we enter the characters. Then these numbers travel from here to there, between web forms and servers, web servers and scripting interpreters, then database servers and back to web pages following the same way.

Where and how you enter your data? Data should exit the way it entered. If your data is provided via web forms, check your web page encodings and how you submit forms. How you get them in your PHP scripts and how you send them to database server. The guilty part here is probably not MySQL but another place. It can be MySQL too; but it is not the only place of possible misbehavior and it probably is not.

Check your pages, check headers as they arrive to your browser.

About comments your question received, no it is not good to use ISO5 because you need multiple of ISO5 families. You must go with a Unicode encoding, for most of the time, the best being utf-8. Also, this is not about which MySQL library you use unless that library has some known bugs which is very unlikely for something that old. :) You should still use whatever recommended as best practices; but your current problem is not related to the library you use. The evil is at the difference between how you enter your data and how you view it.

hasanyasin
  • 6,222
  • 1
  • 17
  • 16
  • I import the data from XLSX files via the import wizard that comes with DreamCoder for MySQL. – RegedUser00x Jun 28 '12 at 23:15
  • Hm, you are right. I inserted some data from the web interface and it works perfectly. Obviously DreamCoder does something wrong when importing the data. – RegedUser00x Jun 28 '12 at 23:19
  • DreamCoder problem is probably about settings. I have not used it before; but it looks like a native Windows application that uses Windows' UI components. So, this can be caused even by the OS that DreamCoder works on if the OS enforces a character set. Browsers are the best for having ultimate control over character encoding. Anyway, I am glad that you could solve your problem. – hasanyasin Jun 28 '12 at 23:34
  • I wrote a PHP script that inserted the data into the tables from XML files so it's not that much of an issue :) Problem is solved now. Thanks for the help. – RegedUser00x Jun 29 '12 at 00:29
0

ensure that you have this meta in your page's header

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" >

Also you could try executing this query right after connecting to the db :

"SET NAMES 'utf8'"

Hope this helps. Cheers

Razvan Pandor
  • 23
  • 1
  • 3