2

I am very new to Kettle tool and found a transformation property where "Transformation Engine Type" can be changed. Can someone help me to understand what "Transformation Engine Type" mean and if it is selected to "Serial Single Threaded", how does transformation's behavior change?

Krishna Gond
  • 143
  • 11

1 Answers1

2

By default, PDI transformations launch all steps in parallel. So, if you have a transformation with 4 steps,

Table input --> Dimension lookup --> Calculator --> Table output

Each step will process rows as they arrive. Table input sends the first block of a few thousand rows to Dimension lookup, and the lookups start immediately. If you have a large volume of data you will have 4 threads continuously doing some work, and rows of data are passed from one thread to the next.

This is the normal behaviour and it's one of the engine's strengths.

However, you may be in a situation where you have a very large transformation, with dozens of steps, but each step doing very little work. In such case, the overhead of parallelising the execution doesn't pay off and you end up with many threads having to wait for CPU time. In such cases, you may be better off in choosing a Single Thread execution model, in which all steps run in the same thread and data is processed serially.

Which one is better depends a lot on your specific use case and there's no substitute to actualy trying both and comparing their speeds.

nsousa
  • 4,448
  • 1
  • 10
  • 15