Breaking out of loop in DFDL

Question

I am trying to convert a FLAT file to XML using DFDL. It has following format: Each element is 5 byte.All are in same line but i am separating them to avoid confusion. I will address element by first letter in them.

0AAAA  
81AAA  
eeeee  
qqqqq    
82BBB    
rrrrr  
sssss  
9QQQQ

Now 0 and 9 are grandparents we don't have to worry about them. 8 is parent and second byte of 81AAA(that is 1) will determine the format of its children. There can be many 8 and many children of a 8 parent(but all of them will have same format).
I tried one schema but once it go into children(eeeee) its not coming out of it and every record is being printed in children format only.

stevedlawrence · Accepted Answer · 2018-10-24T11:16:03.657

Below is a schema that I think describes your data, tested on Daffodil 2.2.0:

<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:fn="http://www.w3.org/2005/xpath-functions"
  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">

  <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />

  <xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:format ref="GeneralFormat" />
    </xs:appinfo>
  </xs:annotation>

  <xs:element name="Root">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="GrandParent" maxOccurs="unbounded">
          <xs:complexType>
            <xs:choice dfdl:initiatedContent="yes">
              <xs:element name="Zero" dfdl:initiator="0">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="Value" type="xs:string" dfdl:length="4" dfdl:lengthKind="explicit" />
                    <xs:element ref="Eight" minOccurs="0" maxOccurs="unbounded" />
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
              <xs:element name="Nine" dfdl:initiator="9">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="Value" type="xs:string" dfdl:length="4" dfdl:lengthKind="explicit" />
                    <xs:element ref="Eight" minOccurs="0" maxOccurs="unbounded" />
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
            </xs:choice>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="Eight" dfdl:initiator="8">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="ChildrenFormat" type="xs:string" dfdl:length="1" dfdl:lengthKind="explicit" />
        <xs:element name="Value" type="xs:string" dfdl:length="3" dfdl:lengthKind="explicit" />
        <xs:choice dfdl:choiceDispatchKey="{ ./ChildrenFormat }">
          <xs:element ref="One" maxOccurs="unbounded" dfdl:choiceBranchKey="1" />
          <xs:element ref="Two" maxOccurs="unbounded" dfdl:choiceBranchKey="2" />
        </xs:choice>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="One" type="xs:string" dfdl:length="5" dfdl:lengthKind="explicit">
    <xs:annotation>
      <xs:appinfo source="http://www.ogf.org/dfdl/">
        <dfdl:discriminator test="{ fn:not(fn:starts-with(., '8') or fn:starts-with(., '9')) }" />
      </xs:appinfo>
    </xs:annotation>
  </xs:element>

  <xs:element name="Two" type="xs:string" dfdl:length="5" dfdl:lengthKind="explicit">
    <xs:annotation>
      <xs:appinfo source="http://www.ogf.org/dfdl/">
        <dfdl:discriminator test="{ fn:not(fn:starts-with(., '8') or fn:starts-with(., '9')) }" />
      </xs:appinfo>
    </xs:annotation>
  </xs:element>

</xs:schema>

A description of how this works:

The Root of the data is an unbounded number of GrandParent elements
Each GrandParent element contains either a Zero or a Nine, based on the initiator. The initiator consumes the first of the 5 bytes of the grandparent data
The Zero/Nine elements contain a Value which consumes the remaining 4 bytes of the gradparent data
Following the Value is zero or more Eight elements
Each Eight element has an initiator of "8", consuming the first of 5 bytes
Each Eight element has a ChildrenFormat, consuming the second of 5 bytes
Each Eight element has a Value, consuming the last 3 of 5 bytes
Each Eight element has an unbounded number of either all One or all Two elements
A choiceDispatchKey/Branch is used to determine whether to parse all One or all Two elements, dispatching off of the ChildrenFormat element
Each One or Two element consumes 5 bytes
In order to determine when the unbounded number of One or Two elements ends, a discriminator is placed on the One/Two elements. This discriminator fails when the data parsed as a One/Two does not start with an '8' or a '9'.
Also, all fields are treated as strings for simplicity

With this, your example data parses to an infoset like so:

<Root>
  <GrandParent>
    <Zero>
      <Value>AAAA</Value>
      <Eight>
        <ChildrenFormat>1</ChildrenFormat>
        <Value>AAA</Value>
        <One>eeeee</One>
        <One>qqqqq</One>
      </Eight>
      <Eight>
        <ChildrenFormat>2</ChildrenFormat>
        <Value>BBB</Value>
        <Two>rrrrr</Two>
        <Two>sssss</Two>
      </Eight>
    </Zero>
  </GrandParent>
  <GrandParent>
    <Nine>
      <Value>QQQQ</Value>
    </Nine>
  </GrandParent>
</Root>

Thanks Steve you are awesome man but group of One or Two element ends only when some Eight element comes along they don't have anything else(like lowercase letter) to separate them from Eight element. — Rishabh, Oct 24 '18 at 05:34
I've modified the discriminators on the One and Two elements to no longer check for lower-case, but instead to fail if the One/Two data starts with an 8 or a 9. — stevedlawrence, Oct 24 '18 at 11:18

Breaking out of loop in DFDL

1 Answers1