0

I have an XML file where all text must be contained/enclosed within an element i.e. if an element contain a child then it cannot contain text.

Example:

This is allowed:

<?xml version='1.0' encoding='UTF-8'?>
<mainParent source="">


<!-- FOR TESTING ONLY -->


    <child1 scheduling=""> <!-- 1ST -->
        <child2 domain="" type=""><![CDATA[ALLOWED TEXT]]></child2>
    </child1>

    <child1 scheduling=""> <!-- 2nd -->
        <child2 domain="" type=""><![CDATA[ALLOWED TEXT 2]]></child2>
    </child1>

</mainParent>

This is not allowed:

<?xml version='1.0' encoding='UTF-8'?>
<mainParent source="">


<!-- FOR TESTING ONLY -->


    <child1 scheduling=""> <!-- 1ST -->
        <child2 domain="" type=""><![CDATA[ALLOWED TEXT]]></child2>
        NOT ALLOWED TEXT 1
    </child1>

    <child1 scheduling=""> <!-- 2nd -->
        <child2 domain="" type=""><![CDATA[ALLOWED TEXT 2]]></child2>
    </child1>

    NOT ALLOWED TEXT 2

</mainParent>

Using DOM:

        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        dbFactory.setXIncludeAware(true);
        dbFactory.setNamespaceAware(true);

        DocumentBuilder docBuilder = dbFactory.newDocumentBuilder();

        doc = docBuilder.parse( new File(fileName) );

The parser only fails if an attributed is repeated or element is not closed. But would like it to also fail if a text entry is 'hanging'.

How can I enforce this?

Thanks.

Mugoma J. Okomba
  • 3,185
  • 1
  • 26
  • 37
  • Do you want to parse a particular class of XML documents for which you could specify a schema defining all elements in the schema? In that case you could validate your instance documents against the schema and you would then get a validation error if an element defined to have element only contents contained mixed contents. You will not be able to change the well-formedness check the normal XML parser performs, you would need to write a schema (W3C XML schema, Schematron schema, RelaxNG schema) to implement your requirement. – Martin Honnen Jun 18 '16 at 14:41
  • @MartinHonnen I don't use a schema. I thought I could handle the issue without resorting to a schema. – Mugoma J. Okomba Jun 18 '16 at 14:53
  • Well, the XML specification defines what is allowed and what not, and as long as you only check the well-formedness of an input document any element is certainly allowed to contain child elements as well as child text nodes (mixed content). There are no settings to have an XML parser restrict the input, unless you define a DTD or schema prescribing allowed element content. – Martin Honnen Jun 18 '16 at 15:14
  • 1
    You could also traverse the doc yourself after parsing and check your conditions, but using standards like schema might make sense - my2c – Stefan Hegny Jun 18 '16 at 20:51

0 Answers0