1

According to JSON RFC: A JSON text is a sequence of tokens. The set of tokens includes six structural characters, strings, numbers, and three literal names.

A JSON text is a serialized object or array.

  JSON-text = object / array

These are the six structural characters:

  begin-array     = ws %x5B ws  ; [ left square bracket

  begin-object    = ws %x7B ws  ; { left curly bracket

  end-array       = ws %x5D ws  ; ] right square bracket

  end-object      = ws %x7D ws  ; } right curly bracket

  name-separator  = ws %x3A ws  ; : colon

  value-separator = ws %x2C ws  ; , comma

Insignificant whitespace is allowed before or after any of the six structural characters.

  ws = *(
            %x20 /              ; Space
            %x09 /              ; Horizontal tab
            %x0A /              ; Line feed or New line
            %x0D                ; Carriage return
        )

So I can represent '{' and '}' as unicode characters? If I convert the JSON object - {"some":"thing\n"} to unicode, it is represented as: \u007B\u0022\u0073\u006F\u006D\u0065\u0022\u003A\u0022\u0074\u0068\u0069\u006E\u0067\u006E\u0022\u007D Why does the YAJL parser gives error to this then? If I modify the input as "\u0022\u0073\u006F\u006D\u0065\u0022\u003A\u0022\u0074\u0068\u0069\u006E\u0067\u006E\u0022" - It can parse it without any error. Does anyone know the reason behind this?

  • Where does it say you can represent those characters as unicode escape codes? Your second example is a string, that should parse just fine, your first example is just invalid as JSON, which is why it doesn't. – Lasse V. Karlsen May 04 '18 at 06:55
  • Thanks...Forgive me but I am novice..' begin-array = ws %x5B ws ; [ left square bracket' - could you explain this? RFCs say that in the end JSON is just unicode characters with six code points being structural token...so can't we write something in unicode that is valid json in UTF-8? – SecResearcher May 04 '18 at 16:17
  • I don't know what "YAJL" is or does but [json.org](http://json.org/) uses the normal characters, not their unicode escape sequences. I believe what you're seeing is simply a way to specify the exact character in a parser language, not the same as saying that "%x7B" or "\u007B" is a legal representation of the `{` character (in json) – Lasse V. Karlsen May 04 '18 at 19:04

0 Answers0