0

Relevant to In XSD I want to specify that an element can only have whitespace content and In XSD how do I allow only whitespace in an element's content?, I have XML data files for which I've created XSD files. After generating the XSD files, and testing them against input, I found that the incoming data files often have a pattern like the following with an element that does not take text:

<source
  id="UGCStrain"
  name="The Strain Complex"
  abbrev="The Strain">
</source>

Currently, my XSD has a lot of elements like the following that have attributes, and sometimes children, but don't use embedded text:

<xs:element name="source">
  <xs:complexType>
    <xs:attribute name="id" use="required" type="uniqueID"/>
    <xs:attribute name="name" use="required" type="xs:string"/>
    <xs:attribute name="abbrev" type="xs:string" default=""/>
    <xs:attribute name="description" type="xs:string" default=""/>
  </xs:complexType>
</xs:element>

Others have text that I want to retain (and which is, in some cases, required). For example, this expression to indicate certain tagged elements need to be added:

<enmasse
   stage="init">
  component.Skill
</enmasse>

with corresponding XSD:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">

  <xs:element name="autotag">
    <xs:complexType>
      <xs:attribute name="group" use="required"/>
      <xs:attribute name="tag" use="required"/>
    </xs:complexType>
  </xs:element>
  
  <xs:element name="enmasse">
    <xs:complexType mixed="true">
      <xs:sequence minOccurs="0">
        <xs:element maxOccurs="1" ref="autotag"/>
      </xs:sequence>
      <xs:attribute name="stage" use="required"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

As per the two linked questions, it is possible to create a type that allows for whitespace only text without raising an error, but it requires every such element to be given that type. Is there any way to just make it work for every element such that, if it's a complextype without 'mixed="true"', it allows for whitespace "text"?

If it's relevant, I'm doing the XSD validation with the Python xmlschema library.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
Sean Duggan
  • 1,105
  • 2
  • 18
  • 48
  • You need to be more specific about the requirements. Do you really mean that _any_ complex type without mixed=true should be treated like this? Or do you have in mind a specific pattern of complex type (perhaps one without any child tags)? You have provided some examples, but I don't see any example of this 'ignorable' whitespace. – kimbert Oct 30 '21 at 11:00
  • re: 'but it requires every such element to be given that type'...what exactly do you mean by that comment? Are you seeking some way to avoid creating an element declaration for each tag in the input XML? – kimbert Oct 30 '21 at 11:01
  • @kimbert: Exactly. I'd like elements that only have whitespace to be treated as empty without having to explicitly handle it for every element definition. Whitespace is important for items without white-space, so I don't think I can use the options to eliminate it. I suppose it would work if it were just for the XSD evaluation, but it would also make it awkward when sharing this for others (the creators of the data format never provided an XSD schema, although the format is documented). – Sean Duggan Oct 30 '21 at 21:13

1 Answers1

1

Your requirements seem to be

a) if a tag contains only whitespace then collapse the white space to an empty string.

b) if a tag contains a mixture of white space and non-whitespace characters, do not suppress any of the white space.

c) automatically do a) and b) for every tag that has a text-only value, regardless of whether the tag is declared in the XSD

Rules a) and b) could be achieved using xs:whiteSpace facets on the simple types in the XSD. Rule c) is not possible because XML schema only works with tags that have an element declaration in the XSD. You can tolerate undeclared tags using xs:any, but XML Schema will not apply any rules to their content.

I think you should pre-process your XML using XSLT and apply rule a) to the whitespace-only tags. Then you could continue to use XML Schema to parse and validate the pre-processed XML.

kimbert
  • 2,376
  • 1
  • 10
  • 20