16

after setting all config file and runtime options for charset that i can find to utf-8, new mysqli connections made with php still has its charset set to latin1, which effectively means that i have to call $mysqli->set_charset('utf8') each time i connect.

$mysqli = new mysqli(DB_HOST, DB_USER, DB_PASS, DB_NAME);  
if ($mysqli->connect_error)  
  err_handle("mysql connect error({$mysqli->connect_errno}).");  
if (!$mysqli->set_charset("utf8"))  
  err_handle("db error({$mysqli->errno}).");

i wonder if there is a permanent way of doing this?

similar problem was encountered in this post.


a "show variables like 'character_set%'" query on the mysql server before calling $mysqli->set_charset('utf8') shows:
(this part was ambiguous in previous revs)

character_set_client    latin1  
character_set_connection    latin1  
character_set_database  utf8  
character_set_filesystem    binary  
character_set_results   latin1  
character_set_server    utf8  
character_set_system    utf8  

the client, connection and results charset can only be changed to utf8 with $mysqli->set_charset('utf8') at runtime. after that it shows:

character_set_client    utf8  
character_set_connection    utf8  
character_set_database  utf8  
character_set_filesystem    binary  
character_set_results   utf8  
character_set_server    utf8  
character_set_system    utf8  

i have

default_charset = "utf-8"

set in php.ini, and

[client]  
default-character-set=utf8  
...  
[mysqld]  
## This option is deprecated in favor of --character-set-server.
#default-character-set=utf8  

set in my.cnf.

the default charset for my tables is also utf8.

seems like the "[client]" options only affect the cmd "mysql" tool and have nothing to do with php.

the return value of $mysqli->character_set_name() is always latin1 no matter what i do, until $mysqli->set_charset('utf8') is called.

i guess "latin1" is a mysql thing, since i cant recall anything else that defaults to "latin1" on my system.

^update: according to mysql manual 9.1.4, 9.1.5 and 5.1.3, character_set_client should be provided by the client. i guess php doesn't provide it upon connection and mysql uses the fall-back charset latin1.

i'm running php 5.3 on debian wheezy with mysql 5.1.

any suggestion?


updated with info from comments:

i forgot to mention the skip-character-set-client-handshake directive and why i was reluctant to use it.

upon first sight i thought ignoring the handshake might result in the situation that the client talks latin1 while the server talks utf8. how does the server convert the string from charset character_set_client to character_set_server without knowing the charset currently in use?

correct me if i'm wrong, plz. i will experiment with this setting later today to see if it works.

Updated with workaroud:

make sure everything works under utf-8 (or any preferable charset). then add the skip-character-set-client-handshake line to my.cnf.

this works for me so far. i experimented with some double-width utf-8 characters. both insert and select succeeded and displayed properly in the browser.

what skipping the handshake means is still unclear. and the mysql server now becomes uncapable of using any charset except utf-8, whick makes this workaround quite impractical since i simply cant apply this setting to all the servers that my website runs on.

so i'm not adopting this workaround. further comments and answers are much appreciated.

Community
  • 1
  • 1
Huang Tao
  • 2,254
  • 2
  • 26
  • 31
  • Have you tried changing the system locale setting with `setlocale()`? – Narf Apr 30 '11 at 16:22
  • @Narf the locales that are available to php (output of `locale -a`) on my system are C, POSIX, and en_US.utf8. moreover, `setlocale()` cant be a permanent solution since _"locale information is maintained per process"_, according to php manual. – Huang Tao Apr 30 '11 at 17:58
  • en_US.utf8 would be the correct one if this should work. It is logical the default charset for every program to be set as the current system locale, unless otherwise configured. If you run PHP as an Apache module it could be possible to set the locale as an apache directive or you could probably even set the default system locale to be en_US.utf8. Anyway ... it's just a thought in case nothing else works. – Narf Apr 30 '11 at 22:41
  • @Narf yep, latin1 makes no sense on my system, where the default locale is en_US.utf8, and every program is configured to use utf8. maybe i should try the mysql mailing list for some help on this issue. – Huang Tao May 03 '11 at 17:48
  • @Unilsland please see my update and give me feedback – Jeremy S. May 04 '11 at 15:32

3 Answers3

20

You have diagnosed the basic problem correctly: While you can change the default MySQL client charset in the client machine's my.cnf or .my.cnf, these files are not used by PHP.

If you think about how PHP's MySQLi/MySQL extensions work, this will make sense -- they have nothing to do with the mysql client program and aren't going to crawl your filesystem for config files, because they use libmysql directly.

To change libmysql's actual default charset, you'll just need to rebuild libmysql. That may not be an answer you like (since you're using precompiled MySQL binaries), but it is the actual answer. The defaults are set at compile time, and then can be overridden at runtime.

If you don't want to do this and calling set_charset() annoys you, my suggestion would be to simply extend the MySQLi class and use that class in place of mysqli. i.e.:

class MyDB extends mysqli {
  // (You could set defaults for the params here if you want
  //  i.e. $host = 'myserver', $dbname = 'myappsdb' etc.)
  public function __construct($host = NULL, $username = NULL, $dbname = NULL, $port = NULL, $socket = NULL) {
    parent::__construct($host, $username, $dbname, $port, $socket);
    $this->set_charset("utf8");
  } 
} 

Typically in an application you'll have some kind of database abstraction layer anyway, so you can either have this layer use MyDB instead of mysqli, or you can have this layer be MyDB and add or override any methods you want (I've done this with simple ORM-less apps).

It's a good practice to always have some kind of database abstraction layer, even if it starts as just class MyDB extends mysqli {} because then you'll never have to search/replace your entire codebase to make small changes.

RE: your workaround, as you explain, this essentially hardcodes your entire db server to UTF-8 regardless of what clients request. Instead of having multiple databases, each with its own charset, the server only works with UTF-8 and may silently mangle data if clients connect with another charset. This is fundamentally wrong because you've effectively moved one aspect of your application's configuration (database charset) from the app/client machine to the database server where it doesn't really belong.

If you think about the application stack's layers,

[server] <=> [network] <=> [client libmysql] <=> [PHP binary] <=> [app]

then you'll understand that the "correct" place for an app-specific configuration like this is in the app itself, not elsewhere in the stack. You may not like having to specify your database's charset in PHP, but if you think about it, that's really where it belongs, because it's also where you're specifying the database itself that you want to connect to -- it's a connection parameter, not a server configuration issue. Hardcoding the charset anywhere else makes your application non-portable.

joelhardi
  • 11,039
  • 3
  • 32
  • 38
3

according to the following posts from MySQL

http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html

your settings are not completely right i.e.

use

 [mysqld]
 character-set-server=utf8
 collation-server=utf8_general_ci

instead of

 [mysqld]
 default-character-set=utf8

for the client I only found

 [mysql]
 default-character-set=utf8

not

 [client]
 default-character-set=utf8

try and give me some feedback.

I can remember that I once read about a setting var to switch off the ability for a client to change the character setting. But I can't find the ref in mysql documentation now. If I find it I let you know.

Hope that helps.

Regards

UPDATE

@Unisland BTW I found this thread http://www.webmasterworld.com/php/3553642.htm where a similar problem is discussed

Try either

So you may try to add a:
[mysqld]
init-connect='SET NAMES utf8'

or

[client]
default-character-set=utf8

[mysqld]
character-set-server=utf8
default-character-set=utf8
default-collation=utf8_unicode_ci
character-set-client = utf8

to set this as a default for all connections, or start with these queries after your specific script connects to the database before sending other queries: SET NAMES utf8; SET CHARACTER_SET utf8;

Jeremy S.
  • 6,423
  • 13
  • 48
  • 67
  • i have the `character-set-server` line actually, the `show variables` query output attached proves that it's working. – Huang Tao May 01 '11 at 01:51
  • i think the `[mysql]` directives only affect the cmd `mysql` tool. correct me if i'm wrong. i know i can ignore the charset info from the client with `skip-character-set-client-handshake`, but i'm a little bit reluctant to do that. doesn't it mean by ignoring the hankshake the situation might become that the client talks _latin1_ while the server reads _utf8_? – Huang Tao May 01 '11 at 01:58
  • @Unilsland try it out. what do you have to loose? The thing is if mysql detects a wrong line in my.cnf you don't really know how it treats the rest of the config. Happend to me once that I had the wrong setting in one line and it resulted in strange behavior – Jeremy S. May 01 '11 at 07:47
  • @Jeremy i replaced the `default-character-set=utf8` line with `character-set-server=utf8` in the [mysqld] directive. it doesn't work. [client] and [mysql] are both legit directives according to [manual 4.5.1](http://dev.mysql.com/doc/refman/5.1/en/mysql-command-options.html). – Huang Tao May 03 '11 at 16:13
  • @Unilsland have you tried to set `skip-character-set-client-handshake` – Jeremy S. May 03 '11 at 20:15
  • "`mysqli::set_charset` is the preferred way to change the charset. Using `mysqli::query()` to execute `SET NAMES` is not recommended"[-- php manual](http://cn.php.net/manual/en/mysqli.set-charset.php) – Huang Tao May 04 '11 at 17:24
  • setting `character-set-client` via cmdline and config file option is not supported. [-- mysql manual](http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html) – Huang Tao May 04 '11 at 17:29
0

I know this is a very old thread, but I just encountered this issue on my Fedora 30 KDE laptop after installing MariaDB (and have spent over an hour looking for the answer). On my Ubuntu 18.04 server, everything just works without having to modify any config files, but on my laptop running Fedora 30 KDE, I had to:

$ sudo vi /etc/my.cnf.d/client.cnf

and then added default-character-set in the [client-mariadb] section:

[client-mariadb]
default-character-set = utf8mb4

I had to do the same for the server config:

$ sudo vi /etc/my.cnf.d/mariadb-server.cnf

and then added the following in the [mysqld] section:

[mysqld]
character-set-server  = utf8mb4
collation-server      = utf8mb4_general_ci

Then just restart MariaDB:

$ sudo systemctl restart mariadb.service

After this, there's no need to explicitly set the character set within PHP scripts.

Scott Deagan
  • 331
  • 3
  • 8