0

I have a little challenge creating a RegEx expression to use in an XML Schema as a restriction on a string element.

The challenge is that the string (right now) can contain the following values:

HASCALCULATOR, LISTUPDATENEEDED, READ ONLY and MANDATORY.

Each value must only appear once and can be in random order. The values ​​are separated by spaces (whitespace). All values ​​do not need to be present.

Examples of valid strings:

HASCALCULATOR LISTUPDATENEEDED READ ONLY MANDATORY
HASCALCULATOR READ ONLY
READ ONLY HASCALCULATOR
MANDATORY
<Blank streng>

Examples of Invalid strings:

READ ONLY HASCALCULATOR READ ONLY
SOMETHING READ ONLY
READ ONLY SOMETHING HASCALCULATOR LISTUPDATENEEDED READ ONLY MANDATORY

I have made the following expression:

(HASCALCULATOR\s?|READONLY\s?|LISTUPDATENEEDED\s?|MANDATORY\s?){0,4}

But it does not cover all cases. For example it permits the repetition of a value. If there is anyone who can help me and come up with a better expression I would be grateful. Notice the limitations in relation to the RegEx expressions in XML Schemas, which is described here: http://www.regular-expressions.info/xml.html

falsetru
  • 357,413
  • 63
  • 732
  • 636
mema
  • 1
  • 1

1 Answers1

0

You have several options.

(1) You can enforce the constraint you have described by writing a fairly elaborate regular expression, along the following lines:

  • Every legal value with all four strings present is a concatenation of some permutation of the sequence ("HASCALCULATOR", "READ ONLY", "LISTUPDATENEEDED", "MANDATORY").
  • Every legal value with fewer than four strings present is a prefix of the concatenation of some permutation.

So you can write the regex out in full by calculating the 24 permutations of your four strings, and making suffixes optional:

<xs:simpleType name="properties">
  <xs:restriction base="xs:string">
    <xs:whiteSpace value="collapse"/>
    <xs:pattern value="((HASCALCULATOR (LISTUPDATENEEDED 
      (READ ONLY (MANDATORY)?)?)?)|(HASCALCULATOR 
      (LISTUPDATENEEDED (MANDATORY (READ ONLY)?)?)?)|(HASCALCULATOR 
      (READ ONLY (LISTUPDATENEEDED (MANDATORY)?)?)?)|(HASCALCULATOR
      (READ ONLY (MANDATORY (LISTUPDATENEEDED)?)?)?)|(HASCALCULATOR 
      (MANDATORY (LISTUPDATENEEDED (READ ONLY)?)?)?)|(HASCALCULATOR 
      (MANDATORY (READ ONLY 
      (LISTUPDATENEEDED)?)?)?)|(LISTUPDATENEEDED (HASCALCULATOR 
      (READ ONLY (MANDATORY)?)?)?)|(LISTUPDATENEEDED (HASCALCULATOR 
      (MANDATORY (READ ONLY)?)?)?)|(LISTUPDATENEEDED (READ ONLY 
      (HASCALCULATOR (MANDATORY)?)?)?)|(LISTUPDATENEEDED (READ ONLY 
      (MANDATORY (HASCALCULATOR)?)?)?)|(LISTUPDATENEEDED (MANDATORY 
      (HASCALCULATOR (READ ONLY)?)?)?)|(LISTUPDATENEEDED (MANDATORY 
      (READ ONLY (HASCALCULATOR)?)?)?)|(READ ONLY (HASCALCULATOR 
      (LISTUPDATENEEDED (MANDATORY)?)?)?)|(READ ONLY (HASCALCULATOR 
      (MANDATORY (LISTUPDATENEEDED)?)?)?)|(READ ONLY 
      (LISTUPDATENEEDED (HASCALCULATOR (MANDATORY)?)?)?)|(READ ONLY 
      (LISTUPDATENEEDED (MANDATORY (HASCALCULATOR)?)?)?)|(READ ONLY 
      (MANDATORY (HASCALCULATOR (LISTUPDATENEEDED)?)?)?)|(READ ONLY 
      (MANDATORY (LISTUPDATENEEDED (HASCALCULATOR)?)?)?)|(MANDATORY 
      (HASCALCULATOR (LISTUPDATENEEDED (READ ONLY)?)?)?)|(MANDATORY 
      (HASCALCULATOR (READ ONLY (LISTUPDATENEEDED)?)?)?)|(MANDATORY 
      (LISTUPDATENEEDED (HASCALCULATOR (READ ONLY)?)?)?)|(MANDATORY 
      (LISTUPDATENEEDED (READ ONLY (HASCALCULATOR)?)?)?)|(MANDATORY 
      (READ ONLY (HASCALCULATOR (LISTUPDATENEEDED)?)?)?)|(MANDATORY 
      (READ ONLY (LISTUPDATENEEDED (HASCALCULATOR)?)?)?))?">
      <xs:annotation>
        <xs:documentation>
          The pattern here was calculated this way.
          1 Let A = "HASCALCULATOR", B = "LISTUPDATENEEDED", 
            C = "READ ONLY", and D = "MANDATORY".
          2 Calculate the permutations of the sequence (A,B,C,D). 
            A sequence with four members has 4! = 24 permutations: 
            (A,B,C,D), (A,B,D,C), (A,C,B,D), (A,C,D,B), ...
          3 From each permutation generate a regex of the form
            (s1 (s2 (s3 (s4)?)?)?)
          4 Join all of these in single optional choice.
        </xs:documentation>
      </xs:annotation>
    </xs:pattern>
  </xs:restriction>
</xs:simpleType>

(2) A less verbose version can be produced by left-factoring the disjunction, so that a construct like

(A (B, (C, (D)?)?)?)
|(A (B, (D, (C)?)?)?) 
|(A (C, (B, (D)?)?)?)
|(A (C, (D, (B)?)?)?)

becomes something like

(A ((B ((C D?)|(D C?))? 
  | (C ((B D?)|(D B?))?)
  | (D ((B C?)|(C B?))?)))

(3) You can re-think the representation of the material. You could, for example, treat the presence of any of the four strings as a flag and ignore repetitions; that would allow a pattern like the one you sketched to work.

(4) You could represent the flags as four boolean attributes, so that instead of

<xs:element name="properties" type="tns:properties"/>
<!--* assumes the declaration for 'properties' type
    * given above *-->

you write something like:

<xs:element name="properties">
  <xs:complexType>
    <xs:attribute name="has-calculator" type="xs:boolean"/>
    <xs:attribute name="mandatory" type="xs:boolean"/>
    <xs:attribute name="read-only" type="xs:boolean"/>
    <xs:attribute name="list-update-needed" type="xs:boolean"/>
  </xs:complexType>
</xs:element>

(5) You could represent the flags as empty elements, which signal a property by occurring:

<xs:complexType name="empty">
  <xs:sequence/>
</xs:complexType>
<xs:element name="properties">
  <xs:complexType>
    <xs:all>
      <xs:element name="has-calculator" 
                  type="tns:empty" minOccurs="0"/>
      <xs:element name="mandatory" 
                  type="tns:empty" minOccurs="0"/>
      <xs:element name="read-only" 
                  type="tns:empty" minOccurs="0"/>
      <xs:element name="list-update-needed" 
                  type="tns:empty" minOccurs="0"/>
    </xs:all>
  </xs:complexType>
</xs:element>

I'd be inclined to use option (5), myself. But from the general feel of the question, coupled with the all-caps strings, I guess you are dealing with output from a well established system and changing the format is not feasible.

C. M. Sperberg-McQueen
  • 24,596
  • 5
  • 38
  • 65