I sniff the bottleneck is in the Excel Inpu
t step rather than the Filter step
.
As a matter of fact, the Excel Input
is very, very slow. Reason why I am using CSV Input
each time it is possible.
The Filter
step is quick, speed higher than a few thousands are common. IN your case, this step spends its time waiting to get rows from the Excel Input
step, rather than in working. That's explains the speed is 49 row/s, not far from the 60 rows/s of the Input Excel
.
The fact that the process slows down after x rows, is an indication that the memory is full and the JVM spends its time in disk swapping. Try to increase the memory size on the spoon.bat/spoon.sh. [set PENTAHO_DI_JAVA_OPTIONS="-Xms1024m" "-Xmx4096m" "-XX:MaxPermSize=256m"]
Something else you can try is to adjust the number of rows the PDI keeps in each steps. Click anywhere, Properties, Miscellaneous, Number of rows in row set. Reduce it until you find the right balance, between the size of the batch read by the Excel Input and total number of records kept in memory.
But the best is to avoid Excel 2007 XLSX Spredsheet type.