What should I read on json and html parsers to build one myself?

Question

I want to create a json and html parser to deepen my knowledge in them (I don't want to reinvent it to be "more efficient", as you could think). What should I read to succede with it?

P.S: I know about parsing laws, but couldn't find some on json.

P.P.S: C++ implementation is my target.

score 0 · Accepted Answer · edited Oct 07 '21 at 11:21

JSON is specified in RFC 8259 (using EBNF) and ECMA-404 (using railroad diagrams). Since they both define the same grammar, which of the two you use is unimportant; go for the one you fibd easier.

JSON parsing is pretty simple. HTML, on the other hand, is a huge project, made more complicated by the absence of a versioned authoritative standard which makes it a bit of a moving target.

HTML parsing as currently defined by the "living standard" is a procedure which probably cannot be encapsulated in a context-free grammar. No real attempt is made to use grammatical descriptions in the standard, although it is possible to extract at least a lexical grammar, if you ignore the sections dealing with the handling of lexical errors.

Certainly, you could write a parser for a well-behaved subset, but that parser might not cope well with many of the "HTML" documents you will want to process. Personally, for learning purposes, I'd suggest trying your hand at XML. (Also see XML Namespaces].

I am really thankful for your detailed response! How do you find this kind of information? — hououin kyouma, Mar 21 '20 at 07:05

What should I read on json and html parsers to build one myself?

1 Answers1