3

I have a unique problem with multibyte character strings and need to be able to shuffle, with some fair degree of randomness, a long UTF-8 encoded multibyte string in PHP without dropping or losing or repeating any of the characters.

In the PHP manual under str_shuffle there is a multi-byte function (the first user submitted one) that doesn't work: If I use a string with for example all the Japanese hiragana and katakana of string length (ex) 120 chars, I am returned a string that's 119 chars or 118 chars. Sometimes I've seen duplicate chars even though the original string doesn't have them. So that's not functional.

To make this more complex, I also need to include if possible Japanese UTF-8 newlines and line feeds and punctuation.

Can anyone with experience dealing in multiple languages with UTF-8 mb strings help? Does PHP have any built in functions to do this? str_shuffle is EXACTLY what I want. I just need it to also work on multibyte chars.

Thanks very much!

Dave
  • 117
  • 5

3 Answers3

4

Try splitting the string using mb_strlen and mb_substr to create an array, then using shuffle before joining it back together again. (Edit: As also demonstrated in @Frosty Z's answer.)

An example from the PHP interactive prompt:

php > $string = "Pretend I'm multibyte!";
php > $len = mb_strlen($string);
php > $sploded = array(); 
php > while($len-- > 0) { $sploded[] = mb_substr($string, $len, 1); }
php > shuffle($sploded);
php > echo join('', $sploded);
rmedt tmu nIb'lyi!eteP

You'll want to be sure to specify the encoding, where appropriate.

Charles
  • 50,943
  • 13
  • 104
  • 142
  • This was EXACTLY what I was looking for. You should include it in the PHP str_shuffle page. – Dave Mar 24 '11 at 00:58
0

This should do the trick, too. I hope.

class String
{

    public function mbStrShuffle($string)
    {
        $chars = $this->mbGetChars($string);
        shuffle($chars);
        return implode('', $chars);
    }

    public function mbGetChars($string)
    {
        $chars = [];

        for($i = 0, $length = mb_strlen($string); $i < $length; ++$i)
        {
            $chars[] = mb_substr($string, $i, 1, 'UTF-8');
        }

        return $chars;
    }

}
Anthony Rutledge
  • 6,980
  • 2
  • 39
  • 44
0

I like to use this function:

function mb_str_shuffle($multibyte_string = "abcčćdđefghijklmnopqrsštuvwxyzžß,.-+'*?=)(/&%$#!~ˇ^˘°˛`˙´˝") {
    $characters_array = mb_str_split($multibyte_string);
    shuffle($characters_array);
    return implode('', $characters_array); // or join('', $characters_array); if you have a death wish (JK)
}
  1. Split string into an array of multibyte characters
  2. Shuffle the good guy array who doesn't care about his residents being multibyte
  3. Join the shuffled array together into a string

Of course I normally wouldn't have a default value for function's parameter.

s3c
  • 1,481
  • 19
  • 28