PHP function substr() error

Question

When I use substr() I get a strange character at the end

$articleText = substr($articleText,0,500);

I have an output of 500 chars and � <--

How can I fix this? Is it an encoding problem? My language is Greek.

Have seen the same thing in (UK) English. – alimack Aug 25 '14 at 11:03 — alimack, Aug 25 '14 at 11:03

score 61 · Accepted Answer · answered Dec 29 '09 at 09:08

61

substr is counting using bytes, and not characters.

greek probably means you are using some multi-byte encoding, like UTF-8 -- and counting per bytes is not quite good for those.

Maybe using mb_substr could help, here : the mb_* functions have been created specifically for multi-byte encodings.

answered Dec 29 '09 at 09:08

Pascal MARTIN

395,085
80
655
663

4

Learning more and more every single day... Thank you stackoverflow ! – Boris Delormas Dec 19 '11 at 10:07
1

Thank you very much. But as for me the main thing is to add `mb_internal_encoding("UTF-8");` before using `mb_*` functions. Without adding it I still see squares. – ivkremer Dec 27 '13 at 15:46
@Kremchik You won't see squares, if you use `mb_substr($short, 0, 75, 'utf-8')`. Then you don't need to use `mb_internal_encoding` before `mb_substr`. – trejder Jun 23 '14 at 12:39

score 20 · Answer 2 · edited Jan 29 '12 at 14:11

20

Use mb_substr instead, it is able to deal with multiple encodings, not only single-byte strings as substr:

$articleText = mb_substr($articleText,0,500,'UTF-8');

edited Jan 29 '12 at 14:11

hakre

193,403
52
435
836

answered Jan 29 '12 at 13:30

Uğur Özpınar

1,033
7
16

2

"UTF-8" part was important for me - don't forget it peeps! – Jul 10 '13 at 19:47
1

"UTF-8" as optional parameter worked for me. Keep in mind that you might also want to use mb_strlen() if you are using the string length to determine if it must be cut. – Kent Munthe Caspersen Jul 15 '13 at 11:20
2

An alternative is to use `mb_internal_encoding('utf-8')` before any `mb_*` command. – trejder Jun 23 '14 at 12:40

score 6 · Answer 3 · answered Dec 29 '09 at 09:10

6

Looks like you're slicing a unicode character in half there. Use mb_substr instead for unicode-safe string slicing.

answered Dec 29 '09 at 09:10

deceze

510,633
85
743
889

1

...with calling `mb_internal_encoding('utf-8')` before or with using `'utf-8'` as fourth parameters of `mb_substr`. Doc says, that it is optional and when it is omitted, the internal character encoding value will be used, but the think is (explained somewhere else in PHP doc), that PHP's "internal encoding" in nearly always "something else" than your page encoding. So for slicing UTF8 string, this fourth parameter or calling `mb_internal_encoding('utf-8')` becomes required. – trejder Jun 23 '14 at 12:42

score 1 · Answer 4 · edited Feb 14 '15 at 02:53

1

use this function, It worked for me

function substr_unicode($str, $s, $l = null) {
    return join("", array_slice(
        preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l));
}

Credits: http://php.net/manual/en/function.mb-substr.php#107698

edited Feb 14 '15 at 02:53

Kerem

11,377
5
59
58

answered May 07 '13 at 21:19

Moussawi7

12,359
5
37
50

score 0 · Answer 5 · answered Aug 18 '12 at 00:59

ms_substr() also works excellently for removing strange trailing line breaks as well, which I was having trouble with after parsing html code. The problem was NOT handled by:

 trim()

or:

 var_dump(preg_match('/^\n|\n$/', $variable));

or:

str_replace (array('\r\n', '\n', '\r'), ' ', $text)

Don't catch.

score 0 · Answer 6 · answered Mar 30 '13 at 17:15

0

Alternative solution for UTF-8 encoded strings - this will convert UTF-8 to characters before cutting the sub-string.

$articleText = substr(utf8_decode($articleText),0,500);

To get the articleText string back to UTF-8, an extra operation will be needed:

$articleText = utf8_encode( substr(utf8_decode($articleText),0,500) );

answered Mar 30 '13 at 17:15

Kristoffer Bohmann

3,986
3
28
35

This doesn't work at all. – gre_gor Apr 06 '22 at 11:16

score 0 · Answer 7 · answered Oct 27 '14 at 12:52

You are trying to cut unicode character.So i preferred instead of substr() try mb_substr() in php.

substr()

substr ( string $string , int $start [, int $length ] )

mb_substr()

mb_substr ( string $str , int $start [, int $length [, string $encoding ]] )

For more information for substr() - Credits => Check Here

PHP function substr() error

7 Answers7

Linked

Related