8

I need to get an array with all the characters from a word, but the word has letters with special encoding like á, when I execute the follow code:

$word = 'withá';

$word_arr = array();
for ($i=0;$i<strlen($word);$i++) {
    $word_arr[] = $word[$i];
}

or

$word_arr = str_split($word);

I get:

array(6) { [0]=> string(1) "w" [1]=> string(1) "i" [2]=> string(1) "t" [3]=> string(1) "h" [4]=> string(1) "Ã" [5]=> string(1) "¡" }

How can I do to obtain each character as follow?

array(5) { [0]=> string(1) "w" [1]=> string(1) "i" [2]=> string(1) "t" [3]=> string(1) "h" [4]=> string(1) "á" }

randominstanceOfLivingThing
  • 16,873
  • 13
  • 49
  • 72
leticia
  • 2,390
  • 5
  • 30
  • 41

4 Answers4

3

Because it is a UTF-8 string, just do

$word = 'withá';
$word = utf8_decode($word);
$word_arr = array();
for ($i=0;$i<strlen($word);$i++) {
    $word_arr[] = $word[$i];
}

The reason for this is that, even though it looks right in your script, the interpreter converts it into a multibyte character (why mb_split() works as well). To convert it to proper UTF-8 format, you can use the mb functions or just specify utf8_decode().

Tim Withers
  • 12,072
  • 5
  • 43
  • 67
2

I think mb_split will do it for you: http://www.php.net/manual/en/function.mb-split.php

If you're using special encodings, you probably want to read up on how PHP handles multibyte encoding in general...

EDIT: Nope, can't figure out how to make mb_split do it myself, but looking around SO got some other questions that were answered with preg_split. I tested this and it seems to do exactly what you want:

preg_split('//',$word,-1,PREG_SPLIT_NO_EMPTY);

I'd still strongly suggest you read up on multibyte characters in PHP though. It's kind of a mess, IMHO.

Here's some good links: http://www.joelonsoftware.com/articles/Unicode.html and http://akrabat.com/php/utf8-php-and-mysql/ and plenty more can be found...

Aerik
  • 2,307
  • 1
  • 27
  • 39
  • Which is the $pattern (first parameter of mb_split function) you recommends to use for this case? – leticia Nov 21 '12 at 20:48
  • 1
    `mb_split` is not exactly what is needed. It only splits based on a regex. It does not directly split the string into an array of characters. About half-way through the comments on the page for the function, however, is a function that will do what is needed. – G-Nugget Nov 21 '12 at 20:51
  • @G-Nugget - excellent point, even if it's somewhat counter-intuitive. I couldn't make it work either, and have revised my answer. – Aerik Nov 21 '12 at 21:35
0

you should use the multibyte-Functions for all Multibyte Charsets! I guess mb_split is the pendant:

http://php.net/manual/en/function.mb-split.php

wegus
  • 282
  • 1
  • 9
0

as found on: http://www.php.net/manual/en/function.str-split.php#107658

    function str_split_unicode($str, $l = 0) {
        if ($l > 0) {
            $ret = array();
            $len = mb_strlen($str, "UTF-8");
            for ($i = 0; $i < $len; $i += $l) {
                $ret[] = mb_substr($str, $i, $l, "UTF-8");
            }
            return $ret;
        }
        return preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY);
    }

   $word = 'withá';
   $word = str_split_unicode($word);
   var_dump($word);
Slavenko Miljic
  • 3,836
  • 2
  • 23
  • 36
  • Doesn't works, this returns: array(5) { [0]=> string(1) "w" [1]=> string(1) "i" [2]=> string(1) "t" [3]=> string(1) "h" [4]=> string(2) "á" } – leticia Nov 21 '12 at 21:13
  • that is strange, on my server i get: array(5) { [0]=> string(1) "w" [1]=> string(1) "i" [2]=> string(1) "t" [3]=> string(1) "h" [4]=> string(2) "á" } – Slavenko Miljic Nov 21 '12 at 21:19
  • @leticia2602 - I'm guessing your file isn't saved with utf-8 encoding - if Tim Withers answer worked for you and this one doesn't, your file is probably encoded with some other encoding. Try using an editor (like Notepad2) that lets you specify the encoding. – Aerik Nov 21 '12 at 21:37
  • @Slave I have PHP Version 5.3.10-1ubuntu3.4 – leticia Nov 21 '12 at 21:38
  • @Aerik I'm getting the value from a MySql table with Collation: utf8_general_ci – leticia Nov 21 '12 at 21:42
  • @leticia2602 Ok - I think you still need to execute `SET NAMES UTF8;` in MySQL, and I think you should encode your PHP source code file in UTF-8 as well. – Aerik Nov 21 '12 at 21:50