The following SGML markup declarations will tell an SGML parser to treat RawPayload
content as unparsed character data (CDATA declared content), such that the <
and &
characters normally interpreted as markup delimiters and entity-reference open character, resp. can appear verbatim in content:
<!ELEMENT TestCase - -
(TestSuiteVersion,TestName,TestEnabled,
TestURL,RawPayload,ParsedOutput)>
<!ELEMENT TestSuiteVersion - - (#PCDATA)>
<!ELEMENT TestName - - (#PCDATA)>
<!ELEMENT TestEnabled - - (#PCDATA)>
<!ELEMENT TestURL - - (#PCDATA)>
<!ELEMENT RawPayload - - CDATA>
<!ELEMENT ParsedOutput - - (#PCDATA)>
However, since the context of your original question is to tunnel HTML or other markup specifically, rather than generic text content, through elements with declared content CDATA, it's worth noting that this won't work as expected: by the SGML spec (ISO 8879:1986), unparsed character data is terminated by any character sequence </X
where X
is a character that is valid as an (element) name start character. Thus, if you attempt to include any angle-bracket markup as content, an SGML parser will stop unparsed character data parsing mode on what looks like the first occurring end-element tag (and will immediately fail with our example DTD since end-element tag omission is not allowed for RawPayload
).
Rather, in SGML, you can include regular HTML markup without any use of CDATA elements or CDATA marked sections by importing the parsing rules for HTML as an SGML DTD grammar. The following example shows a self-contained SGML document declaring your TestCase vocabulary that also imports (my) markup declarations for HTML:
<!DOCTYPE TestCase SYSTEM "http://sgmljs.net/schemas/sgml-cms/w3c/html5.dtd" [
<!ELEMENT TestCase - - (TestSuiteVersion,TestName,TestEnabled,TestURL,RawPayload,ParsedOutput)>
<!ELEMENT TestSuiteVersion - - (#PCDATA)>
<!ELEMENT TestName - - (#PCDATA)>
<!ELEMENT TestEnabled - - (#PCDATA)>
<!ELEMENT TestURL - - (#PCDATA)>
<!ELEMENT RawPayload - - ANY -(TestSuiteVersion|TestName|TestEnabled|TestURL|RawPayload|ParsedOutput)>
<!ELEMENT ParsedOutput - - (#PCDATA)>
<!ENTITY % no_entities "INCLUDE">
]>
<TestCase>
<TestSuiteVersion>1</TestSuiteVersion>
<TestName>Test1</TestName>
<TestEnabled>true</TestEnabled>
<TestURL>http://example.com</TestURL>
<RawPayload>
<h2>Description of whatever is supposed to happen</h2>
<p>Bla Blah bla</p>
</RawPayload>
<ParsedOutput>2021-12-20T19:32:52Z</ParsedOutput>
</TestCase>
By declaring RawPayload
as having declared content ANY, this DTD admits any HTML 5 elements declared in html5.dtd
. I've also specified the element exclusion
-(TestSuiteVersion|TestName|TestEnabled
|TestURL|RawPayload|ParsedOutput)
telling SGML that those elements must not occur in content anywhere.
Depending on your app, it would generally be advisable to avoid handling HTML as black box CDATA content, thereby becoming prone to HTML injection attacks. Rather, if you eventually intend to display user content in a browser, you should scan/filter it for malicious content. Similar to what's shown here, you'd need to at least exclude script
elements but also HTML event handler attributes containing script (or set CSP accordingly for your web app).
You can run this example document as-is using (my) sgmljs software (http://sgmljs.net) eg. the sgmlproc
command line utility. When run with OpenSP SGML, you'd also need to provide a SGML declaration for HTML.