Enjoy RELAX NG compact syntax
Experimenting with various XML schema languages, I have found RELAX NG the best fit for most of the cases (reasoning at the end).
Requirements
- Allow documenting XML document structure
- Do it in readable form
- Keep it simple for the author
Modified sample XML (doc.xml)
I have added one attribute, to illustrate also this type of structure in the documentation.
<objectRoot created="2015-05-06T20:46:56+02:00">
<v>
<!-- Current version of the object from the repository. !-->
<!-- (Occurance: 1) -->
</v>
<label>
<!-- Name of the object from the repository. !-->
<!-- (Occurance: 0 or 1 or Many) -->
</label>
</objectRoot>
Use RELAX NG Compact syntax with comments (schema.rnc)
RELAX NG allows describing sample XML structure in the following way:
start =
## Container for one object
element objectRoot {
## datetime of object creation
attribute created { xsd:dateTime },
## Current version of the object from the repository
## Occurrence 1 is assumed by default
element v {
text
},
## Name of the object from the repository
## Note: the occurrence is denoted by the "*" and means 0 or more
element label {
text
}*
}
I think, it is very hard to beat the simplicity, keeping given level of expressiveness.
How to comment the structure
- always place the comment before relevant element, not after it.
- for readability, use one blank line before the comment block
- use
##
prefix, which is automatically translates into documentation element in other schema format. Single hash #
translates into XML comment and not a documentation element.
multiple consecutive comments (as in the example) will turn into single multi-line documentation string within single element.
obvious fact: the inline XML comments in doc.xml
are irrelevant, only what is in schema.rnc
counts.
If XML Schema 1.0 is required, generate it (schema.xsd)
Assuming you have a (open sourced) tool called trang
available, you may create an XML Schema file as follows:
$ trang schema.rnc schema.xsd
Resulting schema looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="objectRoot">
<xs:annotation>
<xs:documentation>Container for one object</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element ref="v"/>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="label"/>
</xs:sequence>
<xs:attribute name="created" use="required" type="xs:dateTime">
<xs:annotation>
<xs:documentation>datetime of object creation</xs:documentation>
</xs:annotation>
</xs:attribute>
</xs:complexType>
</xs:element>
<xs:element name="v" type="xs:string">
<xs:annotation>
<xs:documentation>Current version of the object from the repository
Occurance 1 is assumed by default</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element name="label" type="xs:string">
<xs:annotation>
<xs:documentation>Name of the object from the repository
Note: the occurance is denoted by the "*" and means 0 or more</xs:documentation>
</xs:annotation>
</xs:element>
</xs:schema>
Now can your clients, insisting on using only XML Schema 1.0 use your XML document specification.
Validating doc.xml against schema.rnc
There are open source tools like jing
and rnv
supporting RELAX NG Compact syntax and working on both Linux as well as on MS Windows.
Note: those tools are rather old, but very stable. Read it as a sign of stability not as sign of being obsolete.
Using jing:
$ jing -c schema.rnc doc.xml
The -c
is important, jing
by default assumes RELAX NG in XML form.
Using rnv
to check, the schema.rnc
itself is valid:
$ rnv -c schema.rnc
and to validate doc.xml
:
$ rnv schema.rnc doc.xml
rnv
allows validating multiple documents at once:
$ rnv schema.rnc doc.xml otherdoc.xml anotherone.xml
RELAX NG Compact syntax - pros
- very readable, even newbie should understand the text
- easy to learn (RELAX NG comes with good tutorial, one can learn most of it within one day)
- very flexible (despite the fact, it looks simple, it covers many situation, some of them cannot be even resolved by XML Schema 1.0).
- some tools for converting into other formats (RELAX NG XML form, XML Schema 1.0, DTD, but even generation of sample XML document) exists.
RELAX NG limitations
- multiplicity can be only "zero or one", "just one", "zero or more" or "one or more". (Multiplicity of small number of elements can be described by "stupid repetition" of "zero or one" definitions)
- There are XML Schema 1.0 constructs, which cannot be described by RELAX NG.
Conclusions
For the requirement defined above, RELAX NG Compact syntax looks like the best fit. With RELAX NG you get both - human readable schema which is even usable for automated validation.
Existing limitations do not come into effect very often and can be in many cases resolved by comments or other means.