1

I am using php to dynamically display a page. However, it does not display correctly on some characters, for example ♥. I am getting the JSON string using SimpleXML. When I do echo $string, it returns âÂÂ¥. Then, I tried using utf8_decode($string), and I got â¥, which is still wrong. How do I manipulate this string correctly for it to display when I write echo $string?

faeophyta
  • 323
  • 5
  • 16

2 Answers2

3

Try putting this in the <head> of your PHP file:

<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>

For HTML5 (thanks to scootergrisen), you can also use:

<meta charset="utf-8">

Edit:

Well, to me, it seems that your API is not encoding properly. This will make any attempts to decode your string fail (and leave you with parsing '♥' yourself).

Your API encodes ♥ as \u00c3\u00a2\u00c2\u0099\u00c2\u00a5, which according to this seems invalid.

Therefore, the only (hackish) solution that I see right now would be to re-parse your API's response yourself, for example like this.

Edit 2:

Whatever it is your API is doing, don't rely on it. You have all the data you need in your XML already (in an unescaped, UTF-8 format), so why not access it directly? :)

This might be the best thing to do without any hackish fixes:

$name = $steamdata->steamID;
Chiru
  • 3,661
  • 1
  • 20
  • 30
  • Or this shorter version in your HTML5 file (if your using HTML5) : – scootergrisen Mar 29 '14 at 01:21
  • Well, you could save an extra two characters by just writing: ``, since HTML5 treats quotes as optional. :) – Chiru Mar 29 '14 at 04:01
  • Please don't omit the quotes. It is horrible practice. – PeeHaa Mar 29 '14 at 10:56
  • @Chiru, unfortunately, neither your solution, nor scootergrisen's solution worked. Anything else you can suggest? – faeophyta Mar 29 '14 at 16:28
  • That's weird. What does the `Content-Type` section of your server's HTTP header say? You can find out with `curl -sIL example.com | grep "Content-Type"`. In this section, you can force the server to use a specific charset. You might want to set it to `Content-Type: text/html; charset=utf-8`, which needs to be done in your webserver's configuration. – Chiru Mar 29 '14 at 18:26
  • @Chiru, This is what it returned: `Content-Type: text/html; charset=utf-8`. Anything else I can try to do to fix the problem? – faeophyta Mar 29 '14 at 21:09
  • This looks perfect! Does "♥" display in a new file on your server that looks like this? http://pastebin.com/wEPw6rvd If not, will it display for a new file that looks like this? http://pastebin.com/4J6wzhfG – Chiru Mar 29 '14 at 23:11
  • @Chiru, seems like your edited answer will work. Haven't tried yet, for one reason. I have to have the code for any usernames the user decides to throw at it, and I can't have a line for every single incorrectly encoded character. Anything else you can suggest? – faeophyta Mar 31 '14 at 21:41
  • Alright, I see. The only way to resolve this for all Unicode characters would be to find out if even all Unicode characters are affected by a wrong conversion function of the API (just try some other characters, for instance). Other than that, you'd have to find out why the API thinks that `\u00c3\u00a2\u00c2\u0099\u00c2\u00a5` is an encoding for `♥` (maybe by comparing a few screw-ups and trying to find a pattern). As far as I can tell, this error seems to happen on server-side (theirs), not yours. Best solution yet: Issue a ticket, this doesn't seem to be right! Have you seen it working? – Chiru Apr 01 '14 at 00:41
  • @Chiru, yes! [This website](http://backpack.tf/id/swissolo) displays the name correctly... Anything else you could suggest? Thank you for persevering in helping me... I appreciate it. :) – faeophyta Apr 01 '14 at 00:50
  • @faeophyta Well, what do you think about this [this](http://pastebin.com/KtmJERV2)? :) – Chiru Apr 01 '14 at 01:01
0

First make sure you save your file and present your file to the browser with the same encoding. For example save your PHP file in UTF-8 and add in your HTML5 file <meta charset="utf-8"> in the <head> part.

<!DOCTYPE html>

<head>

   <meta charset="utf-8">

</head>

If it still dont work it might be because your using some PHP functions that dont understand multibyte and thinks 8 bytes = 1 character.

There are some replacement functions. For example mb_substr() instead of substr() if you install the multibyte extionsion during the PHP installation.

But for some functions there is not a replacement but you can try and make one yourself.

I had problems with ucfirst() because there is no mb_ucfirst().

So instead of this which gave me the same problem you have :

function mb_ucfirst($tekst){

   return utf8_encode(ucfirst(utf8_decode($tekst)));

}

I use this :

function my_mb_ucfirst($str){

    $fc = mb_strtoupper(mb_substr($str, 0, 1));

    return $fc . mb_substr($str, 1);

}

Maybe this can help you. Try looking through your code that manipulates the string and disable the linies one by one until the problem changes.

scootergrisen
  • 1,003
  • 1
  • 11
  • 20
  • Thank you for your detailed reply. I looked through the code, and I do no string manipulation. All I do is get it from XML using SimpleXML, like so: `$data = simplexml_load_file($file);` Then I do: `$name = $data->response->players[0]->name;` When I `echo $name`, it returns `âÂÂ¥`. When I `echo mb_detect_encoding($name);`, it returns `UTF-8`. Do you have any idea of what I may be doing wrong? Thanks again for your help. – faeophyta Mar 29 '14 at 16:26
  • What about your XML file is that also in UTF-8 and have you set the encoding in that file also : **** – scootergrisen Mar 29 '14 at 22:50
  • here's the XML file in question: [XML file](http://steamcommunity.com/id/swissolo/?xml=1), XML doc is also in UTF8 – faeophyta Mar 29 '14 at 23:13
  • The heart is inside **<![CDATA[** and **]]>**. Try **echo $name->asXML();** – scootergrisen Mar 30 '14 at 01:26
  • nope, still doesn't work... prints `[âÂÂ¥]swissolo` – faeophyta Mar 30 '14 at 01:37
  • Make a example somewhere with all the code. Also try another browser and check your browser settings. The browser might be set use another encoding manually. – scootergrisen Mar 30 '14 at 02:23