1

I have an XML file which looks like follows, that I need to validate.

<?xml version="1.0" encoding="iso-8859-1"?>
    <MyAttributes
      Att1="00:00:00"
      Att2="00:05:00"
      Att3="00:05:00"
      Att4="foo,bar,true,true,,,0253d1f0-27d6-4d90-9d35-e396007db787"
      Att5="abc,def,false,true,,,4534234-65d6-6590-5535-da2007db787"
      ....
      ..../>

I want to validate the xml file using XSD schema files as follows.

MyAttributes contains Att1, Att2 and Att3 2. Values of Att1, Att2 and Att3 are of the type TimeSpan 3. All the other attributes in MyAttributes have the belwo format.

  1. Format of all the other attributes are as follows csv format with 7 columns
    first and second columns should be non-empty strings col3 and col4 should be boolean
    col5 and col6 are strings.can be empty col7 should be of type GUID

Is there a way I can validate this with some kind of regex assertion using XSD 1.1?

user330612
  • 2,189
  • 7
  • 33
  • 64

3 Answers3

2

The xs:time type will validate the timespan fields. For the other fields, you can use a restriction to the xs:string type with a regexp. This XSD will validate the example XML you posted:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
    <xs:simpleType name="CsvType">
        <xs:restriction base="xs:string">
            <xs:pattern value="\w+,\w+,(true|false),(true|false),\w*,\w*,[A-Fa-f0-9]{7,8}(-[A-Fa-f0-9]{4}){3}-[A-Fa-f0-9]{11,12}"></xs:pattern>
        </xs:restriction>
    </xs:simpleType>
    <xs:element name="MyAttributes">
        <xs:complexType>
            <xs:attribute name="Att1" type="xs:time" />
            <xs:attribute name="Att2" type="xs:time" />
            <xs:attribute name="Att3" type="xs:time" />
            <xs:attribute name="Att4" type="CsvType" />
            <xs:attribute name="Att5" type="CsvType" />
        </xs:complexType>
    </xs:element>
</xs:schema>

You don't really need XSD 1.1 assertions, unless you want to validate contents of one attribute in relation to the contents of the other.

helderdarocha
  • 23,209
  • 4
  • 50
  • 65
  • This works. Thanks for the explanation. However there is one caveat. The MyAttributes node can contain any number of attributes with value csv type not just 2. So I need to validate that all these attributes ( can be 5 or 10 or 100) follow the same csv format. How can I do that? – user330612 Apr 03 '14 at 01:03
  • Any attribute (or element) that declares its type as `CsvType` in the example above will use that expression to validate its contents. – helderdarocha Apr 03 '14 at 01:07
  • I'm guessing I should use something like to do this. What I'm struggling with is the attribute names from 5 to unbounded...can be anything..So i dont want to hard code the attribute names...but I still want to have a restriction on **any** atrributes other than the first 3 attributes that will be added to the XML file to follow the CSV format. Does that make sense? – user330612 Apr 03 '14 at 01:23
  • You can't control attribute order in XML. You can use `xs:anyAttribute` to allow an element to have any attributes, but can't control how many will have certain contents. You can control the order of **elements** with a `xs:sequence`. Still, you will need XSD 1.1 to validate the contents of an element based on the contents of the others. – helderdarocha Apr 03 '14 at 01:42
0

This regex validates your TimeSpan lines:

"(\d\d):(60|([0-5][0-9])):(60|([0-5][0-9]))"

Regular expression visualization

Debuggex Demo

If it matches, the line is valid. I got the regex from the first answer in this question.

And for your GUID lines, if this matches the line, then it's valid:

"(?:\w+,){2}(?:(?:true|false),){2}(?:\w*,){2}(?:[0-9a-fA-F]{7,8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{11,12})"

Regular expression visualization

Debuggex Demo

Although the first GUID in your demo-input line matches the regex from the first answer in this question, the second one does not, because it has a different number of characters in certain elements. I changed it so it matches both.

Community
  • 1
  • 1
aliteralmind
  • 19,847
  • 17
  • 77
  • 108
0

You can use xs:anyAttribute to allow any attribute at all, but then you can't control the name or type of the attribute. You can only define the type for attributes that are explicitly named in the schema. As you suggest, to handle the general case you will need an XSD 1.1 assertion. This could be of the form:

test="every $a in @* satisfies (
        (name($a) = ('Att1', 'Att2', 'Att3') and $a castable as xs:time) or
        (matches(name($a), 'Att\d+') and matches($a, some-regex))"/>

where some-regex is the regular expression others have supplied, anchored with ^ at the start and $ at the end so it matches the whole string and not some substring.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164