1

I know nothing about Pentaho and I just want to know whether it can do my job before I commit a lot of time to learn it.

Can Pentaho be used as an XML ETL tool? Can it achieve arbitrary transformation? A typical transformation would be to collapse XML element "Company" and its child element "Employee" into the same "Employee" table, but this is just the simplest transformation and there are infinite other possibilities. For example, to import following XML into our database:

    <Root>
       <OrdersByCustomer>
        <CustomerInfo>
          <Customer>
            <CustomerID>1234</CustomerID>
            ...
          </Customer>
          <Address>...</Address>
        </CustomerInfo>
        <Orders>
          <Order>...</Order>
          <Order>...</Order>
          <Order>...</Order>
        </Orders>
      </ OrdersByCustomer>
    </Root>

I need to pickup the CustomerID and insert it together with the data inside XML element "Order".

Can Pantaho do such infinite and arbitrary transformation? Or do I have to cut my own code?

If the answer to the above question is yes, then, two more questions:

  1. Is Pentaho symmetric and bidirectional? We not only need to import XML into our database, we also need to generate XML from data in our database. Can Pentaho do that?

  2. If the answer is again yes, I know Pentaho is a framework and there are books written about it. Do I need to learn the whole framework, or can I just install it, spend half a day to only learn the XML ETL part, and start using it?

1 Answers1

1
  1. Yes, Kettle/PDI can just as easily export XML as it can import it.
  2. No. You can just play around with the XML parts of Kettle, which itself is only one part of the stack. To be clear - Pentaho is not a framework as such, it's a product stack with multiple subproducts - you only need look at the ETL part - PDI/Kettle.

Kettle is v. easy to get started, so just load it up, read some of the many many samples and have a go!

Codek
  • 5,114
  • 3
  • 24
  • 38
  • Thank you sir but could you also answer my first question please: can Kettle do arbitrary transformation of XML? Can it pick up the CustomerID and insert it together with the details into the "Order" table? –  Oct 24 '12 at 22:58
  • ? Not entirely sure what you mean. you can use xpath to query data from xml. that then becomes part of the PDI stream. Later on you can then lookup the customer id whether thats from a database or another stream, which could also be from xml, you can then write to the table. Worth having a look through some of the samples i think, what you're describing sounds very simple so you should be able to get going quickly. – Codek Oct 31 '12 at 08:19
  • What I wanted to know is, without using XPath, purely using the UI drag and drop style to set up, can PDI pick up the CustomerID and put it together with the fields contained inside , and present it as a resultset? –  Nov 01 '12 at 04:59
  • yes. you'll have to read the orders from the file in one stream, then read the customer details from the other stream and use either a join or a stream lookup component to do your lookup. – Codek Nov 05 '12 at 07:57