1

I have been using STAX parser for quite long in pentaho Kettle. But suddenly I got a situation which is weird. Earlier the XML files were having pre-defined levels like :

<A>
     <TRADE a="1" b="2">
        <TRADE a="3" b="4">
        </TRADE>
     </TRADE>
   </A>

 OR

   <A>
     <TRADE a="100" b="200">
       <TRADE a="1" b="2">
          <TRADE a="3" b="4">
             <TRADE a="5" b="6"> 
             </TRADE>
          </TRADE>
       </TRADE>
    </TRADE>
   </A>

If file is comprising of two levels of trade then it was known in advance and the same for three or four levels(four being the highest). Accordingly the Xpath was mentioned in STAX parser(A/TRADE/TRADE/TRADE for level three and so on).

Expected output :

In first case - Two entries(rows) in Trade Table one for parent Trade other for child Trade. In second case - Four entries(rows) in Trade Table - establishing parent child relationship.

How to do this dynamically without knowing the number of Trades(Depth)

But now the file can have any level of TRADE(min 1 max 15). I am baffled as to how to dynamically parse the TRADE file using STAX parser in pentaho kettle.

Any guidance will be extremely helpful.

Regards, Vikas

Vikas Kumar
  • 87
  • 2
  • 18
  • What is the expected output? – bolav Feb 29 '16 at 18:32
  • @bolav - I need to parse in Trade Table. So as can be seen above, in first scenario there should be two(rows) entries- one for parent and other for child(establishing parent child relationship using xml_element_id & parent_xml_element_id)whereas in second scenario there should be four(row) entries(with parent child relationship) in Trade table. This parsing should hapen dynamically without knowing the number of trade beforehand. – Vikas Kumar Mar 01 '16 at 10:03
  • Please add the expected layout of the rows for your two examples, for me to be able to completely understand your question. Can one level have several trades? – bolav Mar 01 '16 at 10:10
  • @bolav - Yes, at any level there can be multiple trades. In order to explain the layout, let me try . There is Trade table. When the first level trade gets saved with it's attribute there is xml_parent_element_id and xml_element_id and if it's respective child trade is saved then there is referential integrity between xml_parent_element_id of child trade with the xml_element_id of Parent Trade. – Vikas Kumar Mar 01 '16 at 10:22
  • Are you looking for a row which has: `xml_element_id,xml_parent_element_id,a,b` for all trades? – bolav Mar 01 '16 at 10:53
  • @ bolav - Yes exactly..But parsing must happen dynamically irrespective of level of Trades. – Vikas Kumar Mar 01 '16 at 11:09

1 Answers1

1

With this Transformation:

Screenshot of transformation

Filter rows:

Screenshot of Filter rows

Row denormaliser:

Screenshot of Row denormaliser

Gives the following output:

Screenshot of output

bolav
  • 6,938
  • 2
  • 18
  • 42