substr_replace encoding in PHP

Question

I want to write to a text file. When I use substr_replace() in PHP, the encoding changes. It doesn't print Greek Characters correctly. If I don't, everything is fine. How can I fix this?

<?php
    $file = "test.txt";
    $writeFile = fopen($file, "w+"); // Read/write
    $myarray = array("δφδφ", "δφδσφδσ", "δφδφδ");
    $myarray[0] = substr_replace($myarray[0], "ε", 0, 1);

    foreach ($myarray as $data) {
        fwrite($writeFile, $data . "\n");
    }
?>

Outcome

ε�φδφ
δφδσφδσ
δφδφδ

Outcome without any substr_replace()

δφδφ
δφδσφδσ
δφδφδ

You can try this multibyte function http://lv.php.net/manual/en/function.substr-replace.php#59544 — arma, Jun 28 '12 at 08:11
You must use multibyte functions to do what you want in this case. A plain `substr_replace` only works on the data as a binary string - without caring for encoding. — Christian, Jun 28 '12 at 08:35

score 18 · Answer 1 · edited Jan 23 '20 at 22:25

You can use these two functions:

From shkspr.mobi:

function mb_substr_replace($original, $replacement, $position, $length)
{
    $startString = mb_substr($original, 0, $position, "UTF-8");
    $endString = mb_substr($original, $position + $length, mb_strlen($original), "UTF-8");

    $out = $startString . $replacement . $endString;

    return $out;
}

From GitHub:

function mb_substr_replace($str, $repl, $start, $length = null)
{
    preg_match_all('/./us', $str, $ar);
    preg_match_all('/./us', $repl, $rar);
    $length = is_int($length) ? $length : utf8_strlen($str);
    array_splice($ar[0], $start, $length, $rar[0]);
    return implode($ar[0]);
}

I tried both and both work well.

score 3 · Answer 2 · answered Jun 28 '12 at 08:22

Assuming you're encoding the Greek in a multi-byte encoding (like UTF-8), this won't work because the core PHP string functions, including substr_replace, are not multi-byte aware. They treat one character as equal to one byte, which means you'll end up slicing multi-byte characters in half if you only replace their first byte. You need to use a more manual approach involving a multi-byte aware string function like mb_substr:

mb_internal_encoding('UTF-8');
echo 'ε' . mb_substr('δφδφ', 1);

The comment @arma links to in the comments wraps that functionality in a function.

score 3 · Answer 3 · answered Nov 08 '12 at 16:38

3

Try this version:

function mb_substr_replace ($string, $replacement, $start, $length = 0) 
{
    if (is_array($string)) 
    {
        foreach ($string as $i => $val)
        {
            $repl = is_array ($replacement) ? $replacement[$i] : $replacement;
            $st   = is_array ($start) ? $start[$i] : $start;
            $len  = is_array ($length) ? $length[$i] : $length;

            $string[$i] = mb_substr_replace ($val, $repl, $st, $len);
        }

        return $string;
    }

    $result  = mb_substr ($string, 0, $start, 'UTF-8');
    $result .= $replacement;

    if ($length > 0) {
        $result .= mb_substr ($string, ($start+$length+1), mb_strlen($string, 'UTF-8'), 'UTF-8');
    }

    return $result;
}

answered Nov 08 '12 at 16:38

Edson Medina

9,862
3
40
51

this function is buggy – evilReiko Jan 26 '14 at 14:45
1

Care to explain @evilReiko? – Edson Medina Jan 26 '14 at 22:50
I tried it, it worked fine, but then I noticed that sometimes it removes the first character after replacement – evilReiko Jan 27 '14 at 09:14
@evilReiko The first character of $replacement? – Edson Medina Jan 28 '14 at 09:22
I think so (it's been few days since I last tried this function) – evilReiko Jan 28 '14 at 09:40
1

I read in comments that this function is buggy. Is it right? Did you try it @EdsonMedina? – Mohammad Kermani Feb 25 '16 at 19:38

score 0 · Answer 4 · edited Jul 26 '15 at 15:05

0

You could try using the mb_convert_encoding() function to set the correct encoding.

edited Jul 26 '15 at 15:05

Gottlieb Notschnabel

9,408
18
74
116

answered Jun 28 '12 at 08:07

user1474090

675
6
5

score 0 · Answer 5 · edited Jan 23 '20 at 22:24

0

function replace($string, $replacement, $start, $length = 0)
{
    $result  = mb_substr($string, 0, $start, 'UTF-8');
    $result .= $replacement;

    if ($length > 0)
    {
        $result .= mb_substr($string, ($start + $length), null, 'UTF-8');
    }

    return $result;
}

edited Jan 23 '20 at 22:24

Peter Mortensen

30,738
21
105
131

answered Mar 26 '14 at 20:27

Bald

2,156
3
24
33

substr_replace encoding in PHP

Outcome

5 Answers5

Linked