1

So I am using substr to limit how much of a news article I show. Here is the code:

substr(strip_tags($news['content']),0,$content_length) . '...';

The problem happens only in a blue moon.. when it tries to cut off near an apostrophe. I get the following output Hoye&#... In this case, the $news['content'] is "Hoye's Pharmacy will be closed.....". The $content_length value in this case happens to be 8. Someone suggested trying mb_substr but that did not fix the problem.

hakre
  • 193,403
  • 52
  • 435
  • 836
Roeland
  • 3,698
  • 7
  • 46
  • 62
  • How are you confirming that the string is `Hoye's`? Seems to me it's `Hoyes` instead (or something similar). Please check the raw output, not output as filtered through a browser. – deceze Nov 04 '11 at 03:38
  • Yes, I assume this is probably what is happening.. hmm solution? – Roeland Nov 04 '11 at 03:42
  • Please see: [Wordwrap / Cut Text in HTML string](http://stackoverflow.com/a/8494901/367456) – hakre Nov 24 '12 at 21:06

1 Answers1

3

Looks like the content contains encoded HTML character entities, and your substring happens to chop the string in the middle of one of them.

e.g.

$string = "Hello & Goodbye";
$broken =  substr($string, 0,7); // Hello &a

if you view the full string in a browser, it'll translate the encoded entity to its "display" version, so you'll see the real character, but once you chop it half with the substr, you'll get the partial &#xxx portion instead, since it can't be translated.

Marc B
  • 356,200
  • 43
  • 426
  • 500
  • Suggested fix: look for the next space after the position and chop there. – Hamish Nov 04 '11 at 03:41
  • 2
    You could decode the string (`html_entity_decode()`) so only 'real' characters are in there, and then use `mb_substr()` on that, so it won't split up unicode characters as `substr()` does. But this could break code farther down the line that's expecting encoded characters that are no longer there. – Marc B Nov 04 '11 at 03:44
  • tried mb_substr(strip_tags(html_entity_decode($news['content'])),0,$content_length).. no cigar :( – Roeland Nov 04 '11 at 04:15