0

I am trying to convert a rally complex fixed length file into XML using DFDL and Daffodil. Each line will be responsible for one element and first element of each line will tell me what kind of element it will be. It can be Parent A or Parent B or it can be child AA or AB or BB or BA.

Where Parent A is one element ,Parent B is another and Child AA is first child of Element A.

Inside one file there are multiple Parent A and Parent B. I tried initiator tag even tried choice tag but nothing seems to be working. Can anyone please help me out.

Rishabh
  • 43
  • 10

1 Answers1

0

It's difficult to give a complete answer without example data, but using initiators and choices is likely the right approach. There are potentially simpler schemas depending on the specific data, but a generic solution might look something like this:

<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">

  <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />

  <xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:format ref="GeneralFormat" lengthKind="delimited" />
    </xs:appinfo>
  </xs:annotation>

  <xs:element name="File">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Record" maxOccurs="unbounded">
          <xs:complexType>
            <xs:choice dfdl:initiatedContent="yes">
              <xs:element name="ParentA" dfdl:initiator="ParentA:">
                <xs:complexType>
                  <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
                    <xs:element name="Content" type="xs:string"/>
                    <xs:element name="Record" maxOccurs="unbounded">
                      <xs:complexType>
                        <xs:choice dfdl:initiatedContent="yes">
                          <xs:element name="ChildAA"  type="xs:string" dfdl:initiator="ChildAA:" />
                          <xs:element name="ChildAB"  type="xs:string" dfdl:initiator="ChildAB:" />
                        </xs:choice>
                      </xs:complexType>
                    </xs:element>
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
              <xs:element name="ParentB" dfdl:initiator="ParentB:">
                <xs:complexType>
                  <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
                    <xs:element name="Content" type="xs:string" />
                    <xs:element name="Record" maxOccurs="unbounded">
                      <xs:complexType>
                        <xs:choice dfdl:initiatedContent="yes">
                          <xs:element name="ChildBA" type="xs:string" dfdl:initiator="ChildBA:" />
                          <xs:element name="ChildBB" type="xs:string" dfdl:initiator="ChildBB:" />
                        </xs:choice>
                      </xs:complexType>
                    </xs:element>
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
            </xs:choice>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

This schema has the following features:

  • Each File has an unbounded number of Record's.
  • Each Record is a choice of either a ParentA or ParentB element, determined by the dfdl:initiator property.
  • Each Parent element contains the Content for that Parent (i.e. the stuff following the parent initiator) followed by an unbounded number of Child Records.
  • Each Child Record is also determined by the dfdl:initator property.
  • A postfix newline separator is used to determine when Parent Content and Child content end.
  • This does not allow a ChildB elements to appear after a ParentA element and vice versa--child elements must always appear after the associated parent element. (If this restriction wasn't important, this schema could be greatly simplified).

The above allows data like this:

ParentA:Parent A Content
ChildAA:Child AA Content
ChildAB:Child AB Content
ParentB:Parent B Content
ChildBB:Child BB Content
ParentA:Parent A Content
ChildAB:Child AB Content

Which would parse into an XML infoset like this:

<File>
  <Record>
    <ParentA>
      <Content>Parent A Content</Content>
      <Record>
        <ChildAA>Child AA Content</ChildAA>
      </Record>
      <Record>
        <ChildAB>Child AB Content</ChildAB>
      </Record>
    </ParentA>
  </Record>
  <Record>
    <ParentB>
      <Content>Parent B Content</Content>
      <Record>
        <ChildBB>Child BB Content</ChildBB>
      </Record>
    </ParentB>
  </Record>
  <Record>
    <ParentA>
      <Content>Parent A Content</Content>
      <Record>
        <ChildAB>Child AB Content</ChildAB>
      </Record>
    </ParentA>
  </Record>
</File>

The above is tested with Apache Daffodil 2.2.0

stevedlawrence
  • 466
  • 3
  • 6
  • Thanks I tried it in the same way but inside we have then or i want then or . is just an extra tag that we are using for loop. I don't want that to be a part of output and without it loop will not work so how to solve this problem?? – Rishabh Oct 16 '18 at 05:54
  • basically i want my xml to look like Parent A Content Child AA Content Child AB Content Parent B Content Child BB Content Parent A Content Child AB Content – Rishabh Oct 16 '18 at 06:17
  • That could be achieved by using unordered sequences (i.e. dfdl:sequenceKind="unordered"), but there are two issues with that. First, the output XML does not maintain data order, but is rearranged into schema order. So all ParentA element would be first, followed by all ParentB elements, regardless of the order the parents appear in the data. Losing order information may or may not be okay. The second issue is that Daffodil 2.2.0 does not support unordered sequences, so it's not really an option right now. To maintain order, having the Record elements is the only way this can be achieved. – stevedlawrence Oct 18 '18 at 13:37
  • Alternatively, you could use XSLT or some other XML transformation language to transform the resulting XML infoset to remove the Record elements while keeping the Parent elements. Requiring a post-parse transformation is not uncommon. This is because the DFDL description describes the layout of data, rather than describing what you you want the XML infoset to look like. Often these two are in conflict. – stevedlawrence Oct 18 '18 at 13:42
  • Thanks Steve can you also tell me how will we process this if child's format is to be decided by 3rd byte of its parent not by 1 byte of their own line?? – Rishabh Oct 23 '18 at 09:53