-1

My pentaho data integration(pdi-ce-7.1.0.0-12) job is taking too much time to read data from MS SQL Server 2008 R2. PDI spoon Application server is installed on Windows server. Pentaho data integration application server and MS SQL Server is on different server IP but in same environment.

Please help me to speed up my job to retrieve data fast.

  • 2
    There is no way anyone can help you without some good solid details of a specific performance issue. – Dale K Feb 21 '19 at 09:32
  • Thanks Dale for your reply. I am using Table Input to take sql data(from MS SQL Server 2008 R2) step in my Pentaho data integration transformation and load it into another table which is on another server. Between this process, Table input sql query is taking too much time to read data where as same sql query gives output in less time on ms sql server management studio. Please let me know, if any specific details you want – Kedar Patwardhan Feb 21 '19 at 09:39
  • Firstly please directly update your question, don't add additional details in the comments. Secondly thats still not specific. What is the table structure? How many rows? How long is it taking? – Dale K Feb 21 '19 at 09:43
  • There is a KEY factor in the comments you ommited in the question ... 'I am using Table Input to take sql data(from MS SQL Server 2008 R2) step in my Pentaho data integration transformation and load it into another table which is on another server' ... This bridging might be the slow part, you have 3 choke places, INPUT > PDI > OUTPUT ... and also depends on how you're doing the output, is it UPDATE ? Full upload with Truncate ? Incremental Update ? We need more to help you. – Cristian Curti Feb 22 '19 at 14:18

1 Answers1

0

A good start for those kind of 'slowness' issues is to increase the memory usage of PDI. To achieve that you need to edit your Spoon.bat(windows) / Spoon.sh(Linux).

In that file there is a line with these info's...

if "%PENTAHO_DI_JAVA_OPTIONS%"=="" set PENTAHO_DI_JAVA_OPTIONS="-Xms2048m" "-Xmx4096m" "-XX:MaxPermSize=512m"

"-Xms2048m" - This is how much memory PDI will start with.

"-Xmx4096m" - This is how much memory MAX PDI will use during execution.

These numbers are in MEGABYTES, so 2048m = ~2Gb RAM

Personally i use 2048m for start and 4096m for maximum usage, and i run proccesses with around 500.000 rows to almost 1.000.000 rows with no problem.

Cristian Curti
  • 1,004
  • 1
  • 7
  • 13
  • Thanks Cristian for your reply. Its helpful to me. here I have edited spoon.bat and below content I found if "%PENTAHO_DI_JAVA_OPTIONS%"=="" set PENTAHO_DI_JAVA_OPTIONS="-Xms3072m" "-Xmx5120m" "-XX:MaxPermSize=512m" my MS SQL table has crore rows need to be transfer to database server. I have schedule my job in jenkins, I have also done table indexing on table. My PDI spoon is on one server & db is on another server (but in same environment). Is there any other way to reduce read data time from PDI. – Kedar Patwardhan Feb 22 '19 at 10:14
  • It could depend on the Query you're using, remenber that PDI uses a lot of overhead, so if you're querying in MSSQL Server (using an SQL tool) it is going to be faster than pentaho, plus, some queries are just 'heavier' than others (more columns, or more rows). – Cristian Curti Feb 22 '19 at 14:16
  • Thanks Cristian for your help, firstly I will increase RAM space for PDI application. Will revert you with update. – Kedar Patwardhan Feb 25 '19 at 04:59