I am looking to define an object model to file-and-directory-structure mapping, where the files are a mixture of XML and proprietary formats.
I would like to do this in as standard and portable a way as possible, without having to write a lot of boilerplate code to map proprietary formats into the object model. Perhaps these proprietary formats could be defined using ASN.1 .
This directory structure may also contain compressed files, which must be usable as virtual file systems.
I would like to be able to cross-reference files in the directory structure in a canonical way.
It is important that the schema type has great Java support and good C++ support. Python and other languages would be a bonus.
It should allow nested variants of file structures and the specification of a canonical variant at each level.
There may be variants of directory structures but there would always be a canonical layout.
e.g. (using the Java/VFS2 style filename format)
The canonical format:
major
minor
binaryFileDDMMYY01.bin
auditFileDDMMYY01.xml
/elements
/element[0]...
binaryFileDDMMYY02.bin
auditFileDDMMYY02.xml
A variant:
major
minor
12.zip!
binaryFileDDMMYY01.bin
auditFileDDMMYY01.xml
/elements
/element[0]...
binaryFileDDMMYY02.bin
auditFileDDMMYY02.xml
As I am already using XML and XML has the xref/link elements, XML would seem the obvious format. But whatever defacto format I use, I will need to hook into the parser/object model to map the proprietary formats into something that plays with the object model of whatever framework I choose (on Java, perhaps a proprietary SAX/DOM implementation that maps to/from the file format), perhaps using custom URL formats (VFS2 style) or schema extensions to define them, e.g:
<xref href="zip:/major/minor/12.zip!auditFileDDMMYY01.xml"/>
and
<xref href="acme:zip:/major/minor/12.zip!binaryFileDDMMYY01.bin"/>
or
<xref format="acme" href="zip:/major/minor/12.zip!binaryFileDDMMYY01.bin"/>
Is there any alternative to XSD schema that would be better disposed to achieving this end? It need not be XML specific, but must cater for XML interchange.