-1

I would like to get an xml structor like:

<root>
    allow<back>no#PCDATA</back>
    allow<front>allow#PCDATA</front>
</root>

I have:

<!ELEMENT root (back?,front?)>
<!ELEMENT back (js*)>
<!ELEMENT front (para*)>
  • Welcome to Stack Overflow. Please take the [tour] to learn how Stack Overflow works and read [ask] on how to improve the quality of your question. It is unclear what you are asking or what the problem is. Please [edit] your question to include a more detailed description of the problem you have. If necessary, provide more example XML data of what you are trying to do. – Progman May 14 '23 at 08:03
  • You really haven't made it clear what structures you want to allow and what you want to disallow. Where do the element names `js` and `para` come into it? – Michael Kay May 14 '23 at 09:28
  • Please provide enough code so others can better understand or reproduce the problem. – Community May 15 '23 at 07:17

1 Answers1

0

Using XML DTDs, the best you can get is

<!DOCTYPE root [
  <!ELEMENT root (#PCDATA|back|front)*>
  <!ELEMENT back (js*)>
  <!ELEMENT front (#PCDATA|para)*>
]>
<root>
  allow<back><!-- no#PCDATA --></back>
  allow<front>allow#PCDATA</front>
</root>

since XML DTDs places restrictions on how the #PCDATA content token can be used; namely, that it has to be part of a choice group (specifically, it must be the first part of a group of elements separated by the | connector) according to the XML specification.

You can check this example using Libxml2 (the xmllint --valid command line utility).

SGML, on the other hand, on which XML is based, and of which XML DTD is designed to be a subset, doesn't have this restriction and allows #PCDATA to occur multiple times:

<!DOCTYPE root [
  <!-- NOTE: this is SGML not XML -->
  <!ELEMENT root - - (#PCDATA,((back,#PCDATA,front?)|(front?)))>
  <!ELEMENT back - - (js*)>
  <!ELEMENT front - - (#PCDATA|para)*>
]>
<root>
  allow<back><!-- no#PCDATA --></back>
  allow<front>allow#PCDATA</front></root>

You can check these SGML examples using OpenSP (the osgmlnorm command line utility) or sgmljs (the sgmlproc command line utility). However, there are restrictions with SGML in this context as well:

  • you will have noticed that the </root> end-element tag is put at the end of the line; this is because SGML would interpret a newline as character data unless it occurs after a line containing only a single element with start- and end-element tags in which case it considers that newline as solely for formatting purposes

  • a content model such as (#PCDATA,back?,#PCDATA,front?) isn't unambiguous and thus disallowed because if the optional back element isn't present, text content could be attributed to either of the two #PCDATA tokens

imhotap
  • 2,275
  • 1
  • 8
  • 16