6

I am trying to define an XSD template for the following:

<template_data>
  <given_name lang="ENG">Zluty</given_name>
  <given_name lang="CES">Žlutý</given_name>
</template_data>

So far, I've come up with

<xs:complexType name="attribute_CES">
  <xs:attribute name="lang" type="xs:string" use="required" fixed="CES"/>
</xs:complexType>

<xs:complexType name="attribute_ENG">
  <xs:attribute name="lang" type="xs:string" use="required" fixed="ENG"/>
</xs:complexType>

<xs:element name="template_data">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="given_name" type="attribute_CES"/>
      <xs:element name="given_name" type="attribute_ENG"/>          
    </xs:sequence>
  </xs:complexType>
</xs:element>

Problem is, this defines an element with one and the same name two times, each time with a different type, to which any XSD validator I've found protests.

As far as I know, you can require an attribute to have a specific value with the fixed option, and that is included in the definition of a (complex) type. So if you want the attribute with a different value, you would have to define a new type.

What I need is the template_data to include both given_names, exactly once with lang="CES", and exactly once with lang="ENG". Is there a way to write an xsd validation schema for that, or is that impossible (for example if the xml input doesn't conform to standards)?

Humungus
  • 575
  • 2
  • 5
  • 16
  • This is not possible with XSD since this means validating the content - XSD can only validate the schema. You'll need something like [Schematron](http://en.wikipedia.org/wiki/Schematron) to achieve what you need. – Filburt Mar 12 '14 at 14:40
  • Really? I've seen some basic content validation with XSD, using `restriction` (http://www.w3schools.com/schema/schema_facets.asp) and with `fixed` in attributes (http://www.w3schools.com/schema/schema_simple_attributes.asp), or with types. – Humungus Mar 12 '14 at 14:49

1 Answers1

7

You can't declare two elements with the same name with different types in the same context, but I think I understand what you want to do.

If you really had elements with very different contents, it would make sense to create two types (and it would also make sense for them to have different names or to at least occur in another context). Since your data is similar, and the main difference is an attribute which describes the text content of the element, you can create one type and restrict the values the attribute can receive:

<xs:complexType name="languageType">
    <xs:simpleContent>
        <xs:extension base="xs:string">
            <xs:attribute name="lang" use="required">
                <xs:simpleType>
                    <xs:restriction base="xs:NMTOKEN">
                        <xs:enumeration value="ENG"/>
                        <xs:enumeration value="CES"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:attribute>
        </xs:extension>
    </xs:simpleContent>
</xs:complexType>

In languageType above you have simple content (xs:string) and a required lang attribute which can only have two values: ENG or CES.

If you want to guarantee that there are exactly two elements, you can restrict that in your template_data element definition with minOccurs="2" and maxOccurs="2" for the given_name child element:

<xs:element name="template_data">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="given_name" type="languageType" minOccurs="2" maxOccurs="2"/>        
        </xs:sequence>
    </xs:complexType>
    ...

Now it is still possible to have two given_name elements with the same lang="ENG" attribute. To restrict that we can add a xs:key definition in the context of template_data element definition:

<xs:element name="template_data">
    <xs:complexType> ... </xs:complexType>
    <xs:key name="languageKey">
        <xs:selector xpath="given_name" />
        <xs:field xpath="@lang"/>
    </xs:key>
</xs:element>

The xs:key uses the nested given_name as a selector and its lang attribute as the key field. It won't allow duplicate fields, that means it will not allow two given_name elements with the same lang atrributes. Since you only allow two, and they can only be ENG or CES, one has to be ENG, and the other CES.

Now these XML document validate:

<template_data>
    <given_name lang="ENG">Zluty</given_name>
    <given_name lang="CES">Žlutý</given_name>
</template_data>

<template_data>
    <given_name lang="CES">Žlutý</given_name>
    <given_name lang="ENG">Zluty</given_name>
</template_data>

But these don't:

<template_data>
    <given_name lang="FRA">Zluty</given_name>
    <given_name lang="CES">Žlutý</given_name>
    <given_name lang="ENG">Zluty</given_name>
</template_data>

<template_data>
    <given_name lang="ENG">Zluty</given_name>
    <given_name lang="ENG">Zluty</given_name>
</template_data>

<template_data>
    <given_name lang="ENG">Zluty</given_name>
</template_data>

<template_data>
    <given_name>Zluty</given_name>
    <given_name lang="ENG">Zluty</given_name>
</template_data>
helderdarocha
  • 23,209
  • 4
  • 50
  • 65
  • Thank you, this is precisely what I was looking for! Out of curiosity, would this still work with another element, e.g. last_name, with similar rules as given_name? I mean two given_names with lang CES and ENG, and two last_names with lang CES and ENG. – Humungus Mar 12 '14 at 17:52
  • Yes, but you would have to define a second key for the last name, since the keys are unique within a given scope. You also would have to move the occurrence constraints to the `sequence` element if you want to keep first-name and last-name together. You could also wrap them in a parent `name` element, and use only one key for the parent element. – helderdarocha Mar 12 '14 at 18:01