How to prevent showing the diamond question mark symbol, even using mb_substr and utf-8

Question

I have read some other questions, tried the answers but got no result at the end. What I get is for example this

Μήπως θα έπρεπε να � ...

and I can't remove that weird question mark. What I do is to get the content of an RSS feed that is encoded also to <?xml version="1.0" encoding="UTF-8"?> using Greek language for the content.

Is there any way to fix this?

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<div><?php
    $entry->description = strip_tags($entry->description);
    echo mb_substr($entry->description, 0, 490);
?> ...</div>

What is `$entry`? Could the issue be the encoding used to store the description text? — Abdullah Jibaly, Jul 10 '11 at 05:20
I have updated my question. What it does, it gets the content of a feed — EnexoOnoma, Jul 10 '11 at 05:23
The "funny question mark" is a real character, called the REPLACEMENT CHARACTER. It probably got added to the data because the stream from your feed was not legal UTF-8, that is, it could not be decoded. Can you show us the content of the string $entry like Abdullah suggests? Perferable as a byte sequence, not a char sequence? And, are you sure the original feed data was encoded in UTF-8? — Ray Toal, Jul 10 '11 at 05:23
Do you get the same encoding error if you don't use `mb_substr`? — Abdullah Jibaly, Jul 10 '11 at 05:25
When I echo it without mb_substr I dont get the question mark. This is a feed I use http://feeds.feedburner.com/blogspot/hyMBI — EnexoOnoma, Jul 10 '11 at 05:38

score 18 · Accepted Answer · answered Jul 10 '11 at 23:26

18

This is the answer

mb_substr($entry->description, 0, 490, "UTF-8");

answered Jul 10 '11 at 23:26

EnexoOnoma

8,454
18
94
179

espradley · Answer 2 · 2015-05-15T14:57:48.463

12

I believe the issue is with your encoding. Your outputting UTF-8 but your browser cannot interpret one of the characters. The question mark symbol as I have known it in the past is actually generated by the browser, so there is no search and replace....it's about fixing your encoding OR eliminating unknown characters from the string before outputting it...

If you have access to the source of data, then you may want to check the DB settings to make sure it's encoded properly...if not, then you'll have to find someway to convert the data over using php...not an easy task...

Perhaps:

mb_convert_encoding($string, "UTF-8");

edited May 15 '15 at 14:57

answered Jul 10 '11 at 05:39

espradley

2,138
2
17
15

1

+1 Looks like you sent the OP down the right direction with the "UTF-8" argument, not sure why someone would down vote this. – Abdullah Jibaly Jul 11 '11 at 04:33
1

Thank you espradley. If I could upvote this 7000 times, I would. I have escaped charset jail. This works for fixing things at template level. – Tom Feb 26 '15 at 19:02

score 0 · Answer 3 · answered Jul 10 '11 at 05:34

0

Have you tried using these seemingly redundant multibyte safe string functions which are not in the php core?

http://code.google.com/p/mbfunctions/

It appears they offer an mb_strip_tags() function like such:

if (! function_exists('mb_strip_tags'))
{
   function mb_strip_tags($document,$repl = ''){
      $search = array('@<script[^>]*?>.*?</script>@si',  // Strip out javascript
                     '@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
                     '@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
                     '@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments including CDATA
      );
      $text = mb_preg_replace($search, $repl, $document);
      return $text;
   }
}

answered Jul 10 '11 at 05:34

AlienWebguy

76,997
17
122
145

Because I am kind of beginner, how can I use that file I have to download into my cPanel ? – EnexoOnoma Jul 10 '11 at 05:41
Just download it from that link I provided, upload it onto your server with the rest of your php files, and include it with include_once('mbfunctions-whatever.php'); – AlienWebguy Jul 10 '11 at 05:51
Ok I did it, but what I got is the content replaced by question marks – EnexoOnoma Jul 10 '11 at 06:23
Do you have multibyte string installed on your server? http://www.php.net/manual/en/mbstring.setup.php – AlienWebguy Jul 10 '11 at 06:35

How to prevent showing the diamond question mark symbol, even using mb_substr and utf-8

3 Answers3

Linked