1

I'm trying to decode a selenium server's response. The server returns:

{"sessionId":null,"status":0,"value":{"os":{"arch":"amd64","name":
"Windows Server 2008 R2","version":"6.1"},"java":{"version":"1.7.0_02"},
"build":{"revision":"15105","time":"2011-12-08 09:56:25","version":"2.15.0"}},
"class":"org.openqa.selenium.remote.Response","hCode":1813953336}

and i'm trying to decode it with the following:

$json = json_decode($s->result);
echo '<pre>'.print_r($json, 1).'</pre>';

At this stage the $s object is:

Scrape Object
(
    [headers] => Array
        (
        )

    [result] => {"sessionId":null,"status":0,"value":{"os":{"arch":"amd64","name":"Windows Server 2008 R2","version":"6.1"},"java":{"version":"1.7.0_02"},"build":{"revision":"15105","time":"2011-12-08 09:56:25","version":"2.15.0"}},"class":"org.openqa.selenium.remote.Response","hCode":287101789}
    [http_code] => 200
    [error] => 
)

However when I actually paste the results into json_decode() it does it just fine? Where am I going wrong?

cmbuckley
  • 40,217
  • 9
  • 77
  • 91
Saulius Antanavicius
  • 1,371
  • 6
  • 25
  • 55

1 Answers1

3

I would guess that $s->result is the HTML response body, and it's not coming back as UTF-8–encoded data (so json_encode returns NULL). This is an issue on the server side, as JSON should be UTF-8–encoded. Ideally, the server would respond with a Content-Type header telling you the encoding of the response body.

However, you can work around the issue by calling utf8_encode on the response:

$json = json_decode(utf8_encode($s->result));
echo '<pre>' . print_r($json, 1) . '</pre>';

This will only work if the response is in ISO-8859-1. As an additional check, you may want to detect the encoding of the response using mb_detect_encoding. You can then pass the result into iconv:

$json = json_decode(iconv($sourceEncoding, 'UTF-8', $s->result));
echo '<pre>' . print_r($json, 1) . '</pre>';

If all else fails, have a look at the output of json_last_error:

if ($json === null) {
    var_dump(json_last_error());
}

EDIT: The error in this case was JSON_ERROR_CTRL_CHAR; the response contained a number of NUL characters, which were removed with str_replace("\0", '', $s->result).

cmbuckley
  • 40,217
  • 9
  • 77
  • 91
  • Hmmm… try checking the output of [json_last_error()](http://php.net/manual/en/function.json-last-error.php) if the result of the `json_encode` is `NULL`. – cmbuckley Dec 30 '11 at 00:25
  • 1
    @Saulius: [JSON_ERROR_CTRL_CHAR = 3](http://php.net/manual/en/function.json-last-error.php) -- Which indicates that most likely you have an encoding problem, as cbuckley indicated. – Conspicuous Compiler Dec 30 '11 at 00:35
  • @Saulius: JSON data must be UTF-8. – Felix Kling Dec 30 '11 at 00:37
  • It is also possible that the response contains carriage returns (it looks like it's coming from a Windows server). Try `str_replace(array("\r", "\n"), array('\r', '\n'), $s->result)`. – cmbuckley Dec 30 '11 at 00:39
  • yeah it must be a control character error, str_replace didn't help either, is there a way to tell curl to return UTF8? – Saulius Antanavicius Dec 30 '11 at 00:44
  • That's entirely down to the origin server. you need to know exactly what encoding it's sending back to you. Can you curl the response on the command-line, piping through less, to see which control characters you're getting? – cmbuckley Dec 30 '11 at 00:53
  • What do you mean by do it through command line piping less? Hmm – Saulius Antanavicius Dec 30 '11 at 00:56
  • Something like `curl http://selenium-server.example.com/path/to/script | less -u` will show the control characters in the response body. – cmbuckley Dec 30 '11 at 01:00
  • Returned {"sessionId":null,"status":0,"value":{"os":{"arch":"amd64","name":"Windows Server 2008 R2","version":"6.1"},"java":{"version":"1.7.0_02"},"build":{"revision":"15105","time":"2011-12-08 09:56:25","version":"2.15.0"}},"class":"org.openqa.selenium.remote.Response","hCode":1250557032}^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ~ – Saulius Antanavicius Dec 30 '11 at 01:03
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/6252/discussion-between-cbuckley-and-saulius-antanavicius) – cmbuckley Dec 30 '11 at 01:07
  • Solution from chat was `str_replace("\0", '', $s->result)`. – cmbuckley Dec 30 '11 at 01:12