2

My browser (chrome and firefox) does not display the umlaut "ö" correctly, once I concatenate a string with the umlaut character.

BILD

// words inside string with umlaute, later add http://www.lageplan23.de instead of "zahnstocher" as the correct solution
$string = "apfelsaft siebenundvierzig zahnstocher gelb ethereum österreich";

// get length of string
$l = mb_strlen($string);

$f = '';
// loop through length and output each letter by itself
for ($i = 0; $i <= $l; $i++){
    // umlaute buggy when there is a concatenation
    $f .= $string[$i] . " ";
}

var_dump($f);

When I replace $string[$i] . " "; with $string[$i]; everything works as expected.

BILD

Why is that and how can I fix it so I can concatenate each letter with another string?

user1711384
  • 343
  • 1
  • 7
  • 24
  • It seems when multi-byte characters are placed next to each other, it can be interpreted as a multi-byte character. Like if character with ASCII values 195 and 182 are placed together, the hex becomes C3B6 which is hex representation for that character (https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=00F6&mode=hex). So, when they are separated by space, it becomes some other characters. – kiner_shah Nov 12 '21 at 11:17
  • Related: https://stackoverflow.com/a/27993420/4688321 – kiner_shah Nov 12 '21 at 11:17
  • Questions like this always send me back to this article on how the string type is "broken" -- it's a good reminder of how the bytes are stored and how the interpretation of multi-byte characters can sometimes be flawed. https://mortoray.com/2013/11/27/the-string-type-is-broken/ – Everett Nov 12 '21 at 12:31

1 Answers1

2

In PHP, a string is a series of bytes. The documentation clumsily refers to those bytes as characters at times.

A string is series of characters, where a character is the same as a byte. This means that PHP only supports a 256-character set, and hence does not offer native Unicode support.

And then later

It has no information about how those bytes translate to characters, leaving that task to the programmer.

Using mb_strlen over just strlen is the correct way to get the number of actual characters in a string (assuming a sane byte order and internal encoding to begin with) however using array notation, $string[$i] is wrong because it only accesses the bytes, not the characters.

The proper way to do what you want is to split the string into characters using mb_str_split:

// words inside string with umlaute, later add http://zahnstocher47.de instead of "zahnstocher" as the correct solution
$string = "apfelsaft siebenundvierzig zahnstocher gelb ethereum österreich";

// get length of string
$l = mb_strlen($string);
$chars = mb_str_split($string);

$f = '';
// loop through length and output each letter by itself
for ($i = 0; $i <= $l; $i++){
    // umlaute buggy when there is a concatenation
    $f .= $chars[$i] . " ";
}

var_dump($f);

Demo here: https://3v4l.org/JIQoE

Chris Haas
  • 53,986
  • 12
  • 141
  • 274