32

In PHP, I use json_encode() to echo arrays in HTML5 data attributes. As JSON requires - and json_encode() generates - values encapsulated by double quotes. I therefor wrap my data attributes with single quotes, like:

<article data-tags='["html5","jquery","php","test's"]'>

As you can see, the last tag (test's) contains a single quote, and using json_encode() with no options leads to parsing problems.

So I use json_encode() with the JSON_HEX_APOS parameter, and parsing is fine, as my single quotes are encoded, but I wonder: is there a downside doing it like this?

lorem monkey
  • 3,942
  • 3
  • 35
  • 49
Jérémy F.
  • 583
  • 1
  • 4
  • 8
  • You mean downside in the meaning that it works? – hakre Jan 12 '12 at 09:18
  • 1
    I mean downside in the meaning of "unexpected side effects that hexadecimal encoding might produce" – Jérémy F. Jan 12 '12 at 09:24
  • You have not showed any code how you output something, so an answer could only be a good guess. – hakre Jan 12 '12 at 09:26
  • My question is more general than specific: I wonder, in general, what is involved in dealing with hexadecimal encoding. – Jérémy F. Jan 12 '12 at 09:36
  • 1
    @Jérémy It *should* work, as in, I can't off the top of my head think of a situation where it would not, but it's really the wrong thing to do. HTML escape any values that may break your HTML syntax, as simple as that. – deceze Jan 12 '12 at 09:39
  • *Some* characters need hex-encoding because no representation exists for those in HTML/XML, not even as entities. Not the case with your strings in question (entities exists for those w/o breaking the attribute value), but in those other cases, javascript hex encoding would be required to transport the value unbroken inside of a X(HT)ML document. See http://www.w3.org/TR/REC-xml/#charsets – hakre Jan 12 '12 at 09:43

2 Answers2

59

You need to HTML escape data echoed into HTML:

printf('<article data-tags="%s">',
    htmlspecialchars(json_encode(array('html5', ...)), ENT_QUOTES, 'UTF-8'));
deceze
  • 510,633
  • 85
  • 743
  • 889
  • +1 always use the appropriate encoding in output, the only way to go. Invalid code-points (like `\x00` would need hex-encoding according to X(HT)ML specs). – hakre Jan 12 '12 at 09:40
  • for simplicity htmlspecialchars(json_encode($arrayData), ENT_QUOTES,'UTF-8') – zainengineer Jun 30 '17 at 12:34
  • i json_encode with `htmlspecialchars(json_encode($arrayData), ENT_QUOTES,'UTF-8')` how to decode? – SNS Nov 09 '17 at 06:23
13

or use the build-in option:

json_encode(array('html5', ...), JSON_HEX_APOS)

you can check it up in the manual: http://php.net/manual/en/json.constants.php#constant.json-hex-apos

Picard
  • 3,745
  • 3
  • 41
  • 50