2

I can't figure out how to write DTD for XML file which can contain same elements in mixed order.

Small example which shows the problem is below:

<root>

  <element>
    <one></one>
    <two></two>
  </element>

  <element>
    <two></two>
    <one></one>
  </element>

  <element>
    <two></two>
    <two></two>
    <two></two>
    <two></two>
    <one></one>
    <one></one>
  </element>

</root>

My DTD:

<!ELEMENT root(element*)>
<!ELEMENT element((one*,two*)|(two*,one*))>

I found a similar topic but the solution does not work in my case (and I'm not sure what is wrong with my DTD at the moment). I get this error message:

xmllint: Content model of Instructors is not determinist: ((one* , two*) | (two* , one*))
Community
  • 1
  • 1
afaf12
  • 5,163
  • 9
  • 35
  • 58

3 Answers3

9
<!ELEMENT element (one|two)*>

(Or + if you must have at least one.)

Dave Newton
  • 158,873
  • 26
  • 254
  • 302
5

Your solution is not deterministic, because

<element>
    <two/>
</element>

is one of the cases that matches both of the branches: (one*, two*) and (two*, one*).

Like @Cristopher noted, @Dave's answer allows mixed ordering and his answer fixes that problem. But actually Christopher's answer is not deterministic either, because when validating input

<element>
    <two/>
</element>

and the validator encounters the first <two> it doesn't know which branch it should select. It only knows this after all of the <two> elements are read.

To keep the order consistent while keeping the model deterministic, use

<!ELEMENT element ( (one+, two*) | (two+, one*) )? >

Key points here are: 1) keeping the model deterministic by beginning each branch with a different mandatory element 2) but still allowing empty <element/> with the ? in the end which makes the content model optional.

jasso
  • 13,736
  • 2
  • 36
  • 50
1

The DTD as given is not determinist, and an xml parser may error on that. (Cf. Section 3.2.1 (normative) and Appendix E (non-normative) of the XML spec. The reason is compatibility with SGML, if anyone remembers that.)

In your DTD, the empty element would match both branches. Dave's solution changes the meaning of the DTD in that it accepts

<root>
  <element>
    <one />
    <two />
    <one />
  </element>
</root>

If you don't want that, make sure that at every “or”-branch, you'd know exactly which one to take by only looking one token ahead, e.g., by writing

<!ELEMENT element ((one+, two*) | (two+, one*))? >
Christopher Creutzig
  • 8,656
  • 35
  • 45