0

I am trying to convert a FLAT file to XML using DFDL. It has following format: Each element is 5 byte.All are in same line but i am separating them to avoid confusion. I will address element by first letter in them.

0AAAA  
81AAA  
eeeee  
qqqqq    
82BBB    
rrrrr  
sssss  
9QQQQ  

Now 0 and 9 are grandparents we don't have to worry about them. 8 is parent and second byte of 81AAA(that is 1) will determine the format of its children. There can be many 8 and many children of a 8 parent(but all of them will have same format).
I tried one schema but once it go into children(eeeee) its not coming out of it and every record is being printed in children format only.

dhilmathy
  • 2,800
  • 2
  • 21
  • 29
Rishabh
  • 43
  • 10

1 Answers1

0

Below is a schema that I think describes your data, tested on Daffodil 2.2.0:

<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:fn="http://www.w3.org/2005/xpath-functions"
  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">

  <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />

  <xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:format ref="GeneralFormat" />
    </xs:appinfo>
  </xs:annotation>

  <xs:element name="Root">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="GrandParent" maxOccurs="unbounded">
          <xs:complexType>
            <xs:choice dfdl:initiatedContent="yes">
              <xs:element name="Zero" dfdl:initiator="0">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="Value" type="xs:string" dfdl:length="4" dfdl:lengthKind="explicit" />
                    <xs:element ref="Eight" minOccurs="0" maxOccurs="unbounded" />
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
              <xs:element name="Nine" dfdl:initiator="9">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="Value" type="xs:string" dfdl:length="4" dfdl:lengthKind="explicit" />
                    <xs:element ref="Eight" minOccurs="0" maxOccurs="unbounded" />
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
            </xs:choice>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="Eight" dfdl:initiator="8">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="ChildrenFormat" type="xs:string" dfdl:length="1" dfdl:lengthKind="explicit" />
        <xs:element name="Value" type="xs:string" dfdl:length="3" dfdl:lengthKind="explicit" />
        <xs:choice dfdl:choiceDispatchKey="{ ./ChildrenFormat }">
          <xs:element ref="One" maxOccurs="unbounded" dfdl:choiceBranchKey="1" />
          <xs:element ref="Two" maxOccurs="unbounded" dfdl:choiceBranchKey="2" />
        </xs:choice>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="One" type="xs:string" dfdl:length="5" dfdl:lengthKind="explicit">
    <xs:annotation>
      <xs:appinfo source="http://www.ogf.org/dfdl/">
        <dfdl:discriminator test="{ fn:not(fn:starts-with(., '8') or fn:starts-with(., '9')) }" />
      </xs:appinfo>
    </xs:annotation>
  </xs:element>

  <xs:element name="Two" type="xs:string" dfdl:length="5" dfdl:lengthKind="explicit">
    <xs:annotation>
      <xs:appinfo source="http://www.ogf.org/dfdl/">
        <dfdl:discriminator test="{ fn:not(fn:starts-with(., '8') or fn:starts-with(., '9')) }" />
      </xs:appinfo>
    </xs:annotation>
  </xs:element>

</xs:schema>

A description of how this works:

  • The Root of the data is an unbounded number of GrandParent elements
  • Each GrandParent element contains either a Zero or a Nine, based on the initiator. The initiator consumes the first of the 5 bytes of the grandparent data
  • The Zero/Nine elements contain a Value which consumes the remaining 4 bytes of the gradparent data
  • Following the Value is zero or more Eight elements
  • Each Eight element has an initiator of "8", consuming the first of 5 bytes
  • Each Eight element has a ChildrenFormat, consuming the second of 5 bytes
  • Each Eight element has a Value, consuming the last 3 of 5 bytes
  • Each Eight element has an unbounded number of either all One or all Two elements
  • A choiceDispatchKey/Branch is used to determine whether to parse all One or all Two elements, dispatching off of the ChildrenFormat element
  • Each One or Two element consumes 5 bytes
  • In order to determine when the unbounded number of One or Two elements ends, a discriminator is placed on the One/Two elements. This discriminator fails when the data parsed as a One/Two does not start with an '8' or a '9'.
  • Also, all fields are treated as strings for simplicity

With this, your example data parses to an infoset like so:

<Root>
  <GrandParent>
    <Zero>
      <Value>AAAA</Value>
      <Eight>
        <ChildrenFormat>1</ChildrenFormat>
        <Value>AAA</Value>
        <One>eeeee</One>
        <One>qqqqq</One>
      </Eight>
      <Eight>
        <ChildrenFormat>2</ChildrenFormat>
        <Value>BBB</Value>
        <Two>rrrrr</Two>
        <Two>sssss</Two>
      </Eight>
    </Zero>
  </GrandParent>
  <GrandParent>
    <Nine>
      <Value>QQQQ</Value>
    </Nine>
  </GrandParent>
</Root>
stevedlawrence
  • 466
  • 3
  • 6
  • Thanks Steve you are awesome man but group of One or Two element ends only when some Eight element comes along they don't have anything else(like lowercase letter) to separate them from Eight element. – Rishabh Oct 24 '18 at 05:34
  • I've modified the discriminators on the One and Two elements to no longer check for lower-case, but instead to fail if the One/Two data starts with an 8 or a 9. – stevedlawrence Oct 24 '18 at 11:18