20

I get an UTF-8 string from db, and trying to echo its first character:

$title = $model->title;
echo $title[0];

I get:

What's wrong?

Álvaro González
  • 142,137
  • 41
  • 261
  • 360
Nick_NY
  • 367
  • 2
  • 5
  • 15

4 Answers4

36
$first_char = mb_substr($title, 0, 1);

You need to use PHP's multibyte string functions to properly handle Unicode strings:

http://www.php.net/manual/en/ref.mbstring.php

http://www.php.net/manual/en/function.mb-substr.php

You'll also need to specify the character encoding in the <head> of your HTML:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

or:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-16" />
Botond Balázs
  • 2,512
  • 1
  • 24
  • 34
16

There are several things you need to consider:

  1. Check that data in the DB is being stored as UTF-8
  2. Check that the client connection to the DB is in UTF-8 (for example, in mysql see: http://www.php.net/manual/en/mysqli.character-set-name.php)
  3. Make sure that the page has it's content-type set as UTF-8 [you can use header('Content-Type: utf-8'); ]
  4. Try setting the internal encoding, using mb_internal_encoding("UTF-8");
  5. Use mb_substr instead of array index notation
Paul S
  • 1,229
  • 9
  • 17
  • 1
    ...and as mentioned in the other two answers you should be using 'mb_substr' – Paul S Nov 22 '12 at 09:09
  • 1
    This was useful to me too. Thanks. – Botond Balázs Nov 22 '12 at 09:16
  • 1
    I know this is old, but the issue is still relevant today. As far as I know, doing all of this is not going to solve anything if you still use the array notation `$title[0]`. @PaulS said "you should be using `mb_substr`", but can someone confirm you **must** use `mb_substr`? And if so, shouldn't we edit this accepted answer to add this crucial point? – sylbru Jul 04 '17 at 13:36
  • 1
    @Niavlys I can confirm that. This answer does not solve the actual problem. – Michael Härtl Apr 05 '18 at 09:24
15

As previously mentioned in other questions, with PHP, when attempting to get a substring, it doesn't understand multibyte characters (as you get with UTF8 for example).

What the other answers don't mention is that you should hint the encoding you would like to use for the mb_substr

So, for example, I use this:

 mb_substr( "Sunday", 0, 1,'UTF8'); // Returns S
 mb_substr( "воскресенье", 0, 1,'UTF8'); // Returns в
Layke
  • 51,422
  • 11
  • 85
  • 111
5

PHP strings doesn't understand multibyte strings by default, the array like indexing will chop of the first byte and if that happens not to be in the ascii range you get this result.

Use mb_substr method.

complex857
  • 20,425
  • 6
  • 51
  • 54