0

I have 4 million records that needs daily load of data from source to target and we are doing truncate everyday. It takes like 9 hours as there are like 10 tables doing 4 million records data loading every day. Could you please tell me how do i improve the performance. Thanks

Ujjwal Chowdary
  • 125
  • 2
  • 5
  • 18
  • What RDBMS are you using? – Nikhil Oct 31 '16 at 18:19
  • My Source is Oracle and Destination is MySQL – Ujjwal Chowdary Oct 31 '16 at 18:32
  • I know this solution and has already implemented but no use and I even increased number of copies to 4. – Ujjwal Chowdary Nov 01 '16 at 18:51
  • Ok. Then I can't really think of any other way in Pentaho to make it faster. What about the bulk loader? – Nikhil Nov 01 '16 at 18:54
  • I tried MySQL bulk loader but i am getting some error in creating mkfifo file on the Linux server as we can only use MySQL bulk loader in Linux environment – Ujjwal Chowdary Nov 01 '16 at 19:49
  • Hello Nikhil, where do i add the settings useServerPrepStmts=false rewriteBatchedStatements=true useCompression=true inside the pentaho server for the MySQL Pentaho Admin Console to pick up the settings? – Ujjwal Chowdary Nov 09 '16 at 15:09
  • In the Database connections window, you will see "options" on the left side of the window. You can enter these parameters over there. – Nikhil Nov 09 '16 at 16:46
  • i want to make changes in the BI server, as we have database connections configured in the server. So Please tell me how to do it. Do i have to edit any JDBC file. If so what is the path for it. Thanks – Ujjwal Chowdary Nov 09 '16 at 18:14
  • I am not sure. See if this helps: https://help.pentaho.com/Documentation/5.1/0H0/060/010/030/020 – Nikhil Nov 09 '16 at 20:05

1 Answers1

1

You have two options:

  1. Use the MY SQL bulk loader step available in PDI. It will definitely make the data loading faster.
  2. Speed can be boosted by using some simple JDBC-connection setting.

    useServerPrepStmts=false
    rewriteBatchedStatements=true
    useCompression=true
    

These options should be entered in PDI at the connection. Double click the connection go to Options and set these values.

Used together, useServerPrepStmts=false and rewriteBatchedStatements=true will “fake” batch inserts on the client. Specifically, the insert statements:

INSERT INTO t (c1,c2) VALUES ('One',1);
INSERT INTO t (c1,c2) VALUES ('Two',2);
INSERT INTO t (c1,c2) VALUES ('Three',3);

will be rewritten into:

`INSERT INTO t (c1,c2) VALUES ('One',1),('Two',2),('Three',3`);

The third option useCompression=true compresses the traffic between the client and the MySQL server.

Finally you can increase the number of copies of the output step to 2 so that there are two threads inserting into the database.

Hope it helps!

Reference (Option 2): https://anonymousbi.wordpress.com

Nikhil
  • 621
  • 1
  • 13
  • 25