5

I'm sending a JSON POST body to my PHP web service that looks something like this:

{
    "foo": "☺"
}

When I echo out the body in the PHP, I see this:

{
    "foo":"\xe2\x98\xba"
}

I've also tried sending the \uXXXX equivalent:

{
    "foo": "\u263a"
}

This got further, in that the raw JSON string received had "foo":"\\u263a", but after json_decode the value turned to \xe2\x98\xba.

This is causing problems when I come to use the value in a JSON response. I get:

json_encode(): Invalid UTF-8 sequence in argument

At its simplest, this is what happens why I try to JSON encode the string:

> php -r 'echo json_encode("\x98\xba\xe2");'
PHP Warning:  json_encode(): Invalid UTF-8 sequence in argument in Command line code on line 1

My question is: how can I best get this smiley face from one end of my application to the other?

I'd appreciate any help you could offer.

Ross McFarlane
  • 4,054
  • 4
  • 36
  • 52

3 Answers3

2

I believe this is the correct behavior of json_encode. If you use the following:

<script>
    alert(
     <?php
       $a = "☺";
       echo json_encode($a);
     ?>
    );
</script>

The HTML output will be alert("\u263a"); and the alert will show since "\u263a" is a correct representation of the string in JavaScript.

Usage of JSON_UNESCAPED_UNICODE constant as the second parameter of json_encode in PHP is also an option, but available only for PHP 5.4.0 or newer.

In what scenario do you intend to use the value?


Edit:

php -r 'echo json_encode("\x98\xba\xe2");'

PHP Warning: json_encode(): Invalid UTF-8 sequence in argument in Command line code on line 1

The problem is you use a wrong sequence of characters. It should be

echo json_encode("\xe2\x98\xba"); // this works for me

instead of

echo json_encode("\x98\xba\xe2"); 
Community
  • 1
  • 1
Mifeet
  • 12,949
  • 5
  • 60
  • 108
  • I think you're on to something here. The value needs to be returned as JSON, and that's where I'm having trouble. – Ross McFarlane Jun 03 '13 at 11:56
  • @rossmcf So you want to send a string with that character as JSON reponse from PHP, right? And what is the trouble? If the JSON response is processed by JavaScript, it should behave correctly even if the result is `\u263a` instead of `☺`. – Mifeet Jun 03 '13 at 11:59
  • The trouble is that json_encode won't encode `'\x98\xba\xe2'`, at least in my version of PHP. – Ross McFarlane Jun 03 '13 at 12:00
  • @rossmcf I think the problem is somewhere in the code you're not showing us. My guess is there is an inappropriate double conversion. – Mifeet Jun 03 '13 at 12:03
  • I've just added some more detail to the question. It seems to be a problem with json_encode when dealing with the internal representation of the string, i.e. '\x98\xba\xe2'. – Ross McFarlane Jun 03 '13 at 12:04
2

PHP's json_decode() function behaves correctly given your input case, returning the sequence of UTF-8 bytes (E2 98 BA) that represent the character.

However, Apache HTTPD applies the \x escaping (in function ap_escape_logitem()) before writing the line to the error log (as you did for testing purposes using error_log()). As noted in file server/gen_test_char.c, "all [...] 8-bit chars with the high bit set" are escaped.

PleaseStand
  • 31,641
  • 6
  • 68
  • 95
1

I think when you encode you have to use json_encode({ foo": "☺"}, JSON_UNESCAPED_UNICODE)

Basically json_encode function works only for UTF-8 encoding so before you encode check the encoding of string,like this .

 mb_check_encoding("your string", 'UTF-8') ;

if it returns false then you can convert to utf-8 using

utf8_encode("your string");
Arun Killu
  • 13,581
  • 5
  • 34
  • 61
  • Thanks Arun. When I tried your suggestion, json_encode outputted: "\u0098\u00ba\u00e2", which is three other characters altogether. – Ross McFarlane Jun 03 '13 at 12:10