3

Cannot find the reason why I am unable to store in a MySQL database characters like ţ, î, ş.

My table definition is:

CREATE TABLE IF NOT EXISTS `gen_admin_words_translated` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `word_id` int(10) NOT NULL,
  `value` text COLLATE utf8_unicode_ci,
  `lang_id` int(2) NOT NULL,
  `needUpd` int(1) NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=2689 ;

The connection to the database is done with the following script:

$charset = "UTF8";
$link = mysql_connect($host, $user, $pass);
if(!$link){
    die("Unable to connect to database server.");
}
mysql_selectdb($database);
if(function_exists("mysql_set_charset")){
    mysql_set_charset($charset, $link);
}else{
    mysql_query("SET NAMES $charset");   
}

I have on the head part of the page:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

and the script is:

$text = 'ţ, î, ş';
mysql_query("insert into gen_admin_words_translated (word_id, lang_id, value, needUpd) values (1, 1, '$text', 1)");

All I get in the end in the table is:

SELECT * FROM  `gen_admin_words_translated` 

id   word_id value lang_id needUpd
5166 1034    ?,    1       1
Mihai Iorga
  • 39,330
  • 16
  • 106
  • 107
CristiC
  • 22,068
  • 12
  • 57
  • 89
  • 1
    Since you are trying to insert character literals... is your script's text encoding (as saved on disk) also UTF-8? – Jon Sep 16 '11 at 07:31
  • @Jon: How can I check that? I am running the script on a server, not locally. – CristiC Sep 16 '11 at 07:33
  • Depends on your editor, but you can also do a quick and dirty check: add the character `€` to your file somewhere and save. If the file size changes by 1 or 2 bytes instead of 3, you are not on UTF-8. – Jon Sep 16 '11 at 07:36

6 Answers6

4

as I ran your script it worked for me:

$charset = "UTF8";
$link = mysql_connect('localhost', 'root', '') or die('connection?');
mysql_select_db('test') or die('database?');
if(function_exists("mysql_set_charset")){
    mysql_set_charset($charset, $link);
}else{
    mysql_query("SET NAMES $charset");   
}

$text = 'ţ, î, ş';
mysql_query("insert into gen_admin_words_translated (word_id, lang_id, value, needUpd) values (1, 1, '$text', 1)");

$query = mysql_query('SELECT * FROM  `gen_admin_words_translated`');
$array = mysql_fetch_array($query);

print_r($array)

result:

Array
(
    [0] => 2689
    [id] => 2689
    [1] => 1
    [word_id] => 1
    [2] => ţ, î, ş
    [value] => ţ, î, ş
    [3] => 1
    [lang_id] => 1
    [4] => 1
    [needUpd] => 1
)

things to check:

check if your webpage is really UTF-8, maybe you have some chaset set another place.

header('Content-type: text/html; charset=utf-8');

file encoding should be also UTF-8 as it may break your characters if otherwise ..

Mihai Iorga
  • 39,330
  • 16
  • 106
  • 107
  • 1
    The encoding of his webpage has nothing to do with MySql. – Jon Sep 16 '11 at 07:40
  • the encoding with his webpage has to do with the insert, the characters don't even get to database if the page is not UTF8. Try opening an notepad, type those characters and close it as ANSI and you'll see that it breaks the characters. – Mihai Iorga Sep 16 '11 at 07:43
  • Sorry, my mistake for not being exact. I mean that *what he is trying right now* (which does not work) has nothing to do with his webpage. Also, he has a `meta` tag in there which, while not ideal, should suffice for the browser. – Jon Sep 16 '11 at 07:48
  • Thanks Mihai. It was an issue with my editor as it wasn't encoding in UTF8. – CristiC Sep 16 '11 at 21:28
3

Expanding my comments into an answer:

It seems that you have set up things correctly, and are only stuck on inserting a string literal to the database. To do that successfully you must also ensure that your text encoding for the saved PHP script is also UTF-8.

Most decent editors will let you know which encoding you are currently working with and can also save as (i.e. convert between) different encodings (even Notepad does this today). However, as a quick check you can add the character to your file somewhere and save it. If the file size changes by 1 or 2 bytes instead of 3, you are not on UTF-8 and you need to convert the file to that encoding.

Other than that, when receiving text as input from the browser your code should handle it just fine.

Note: While using a <meta> tag to set the encoding for your page should be sufficient, it's better if you do this with an HTTP header from PHP like this:

header('Content-type: text/html; charset=utf-8');
Jon
  • 428,835
  • 81
  • 738
  • 806
  • Thank you Jon for this. I spent more than one hour with such a stupid thing. Indeed my improvised editor was not using UTF-8 encoding. – CristiC Sep 16 '11 at 21:25
2

Does the last result you pasted come from MySQL Command-Line? If does, try SET NAMES utf8; before query SELECT * FROM gen_admin_words_translated

l99933
  • 149
  • 1
  • 5
  • 2
    He already does that. Did you read the code? (I could say the same to the upvoter) – Jon Sep 16 '11 at 07:39
  • Yes, he did, but only in the PHP code, right? What I am not sure is he was using MySQL Command-Line or other tools to execute the SELECT query. – l99933 Sep 16 '11 at 07:56
2

If this:

$text = 'ţ, î, ş';

is your literal code, you need to make sure that the PHP source file is encoded as UTF-8 as well. Otherwise, these characters will be ISO-8859-1 characters in a Unicode context, resulting in broken characters.

Pekka
  • 442,112
  • 142
  • 972
  • 1,088
2

Check your MySQL initialization file. It should include these character-set lines:

[client]
port=3306

[mysql]
default-character-set=utf8
port = 3306
#
[mysqld]
basedir=".....
#Path to the database root
datadir=".....
# The default character set that will be used when a new schema or table is
# created and no character set is defined
character-set-server=utf8
Costis Aivalis
  • 13,680
  • 3
  • 46
  • 47
  • This answer is now deprecated as the default-character-set should be `utf8mb4`, not `utf8`, since `utf8` in MySQL is limited to 3 bytes and cannot represent all Unicode characters. – PatPeter Jul 05 '18 at 18:51
1

In this statement, you are inserting characters as they exist in the current PHP file:

$text = 'ţ, î, ş';

However, they will be encoded using the character encoding of your PHP file. Unless this PHP file uses UTF-8 encoding itself, the resulting string won't be UTF-8 encoded.

You should use your text editor to check the character encoding used on the current file. All decent text editors should be able to display, and some may be able to convert, the character encoding used in a document.

To create more portable code, ensuring the character encoding of your document doesn't matter, you can use encoded values like this:

$text = "\xC5\xA3, \xC3\xAE, \xC5\x9F";

Unfortunately, if you have to do a lot of this it'll be a pain, because you have to use the multi-byte hex representation - PHP doesn't have a native Unicode way of specifying characters like some other languages (where you can go "\u163" instead of "\xC5\xA3").

You can look up the UTF-8 representation in hex using tools like this.

thomasrutter
  • 114,488
  • 30
  • 148
  • 167