3

SGML has many optional features to allow markup minimization, such as optional or implied start and end tags and SHORTREF for simpler aliases of tags. Is it thus possible to write a DTD that a perfect SGML implementation, which has always been a rare to non-existent thing, could use to successfully parse arbitrary markdown documents?

There are differences among existing markdown parsers, which Commonmark tries to standardize away, so there is some leeway in border cases for an SGML-based parser.

Crissov
  • 947
  • 11
  • 16
  • 1
    This is a very interesting question. However, Yes/No questions and especially "Is it possible" questions are often considered problematic, because the answer is never particularly helpful: if it is "No", then you still don't know how to solve your problems, if it is "Yes", then again, you only know that you *can* solve but are still no step closer to the solution. You could ask about specific features of SGML that are prohibiting or enabling parsing Markdown, but list questions are also off-topic. Your best bet would be to re-frame the question in terms of a specific problem you are having … – Jörg W Mittag Jan 03 '19 at 11:43
  • … that can only be solved by parsing Markdown with SGML, even if the problem is hypothetical. – Jörg W Mittag Jan 03 '19 at 11:44

1 Answers1

2

While many markdown constructs can be parsed into HTML using SGML short references, markdown's inline and reference links can't.

Inline links such as [link text](link URL) are problematic since the href attribute of the produced a element must be populated with the link URL as value, which doesn't work at all with SGML short references. Reference links, in addition, require unbounded lookahead, since they can be placed everywhere in text before or after actual usage.

Another problem is markdown auto-escaping and auto-links.

Edit: just for your info, sgmljs.net (my project) contains a full markdown (+ common extensions) to HTML translation embedded in an SGML parser, but it merely exposes markdown short reference map declarations "virtually" via a public identifier that "magically" switches on markdown to HTML translation when referenced in a document's prolog; actual markdown translation and processing is hard-coded using JavaScript (see http://sgmljs.net/docs/markdown.html). A problem with using markdown from SGML not mentioned is that markdown wants a "markup block" (HTML block generalized to allow any explicit element tags or other markup constructs) separated by newline(s) from preceding or succeeding markdown text, which is a constraint that cannot be captured in SGML.

imhotap
  • 2,275
  • 1
  • 8
  • 16