2
INSERT INTO table3
SELECT
  tbl1.pk_1, tbl1.pk_2, tbl2.pk_3, tbl1.pk4
FROM
  table1 tbl1, table2 tbl2
WHERE
  tbl1.pk_1 = 'root' and
  NOT EXISTS 
    (SELECT 1 FROM table3 tbl3
     WHERE tbl3.pk_1 = tbl1.pk_1 and tbl3.pk_2 = tbl1.pk_2 and tbl3.pk_3 = tbl2.pk_3 and tbl3.pk_4 = tbl1.pk_4) 
;

My SQL statements looks like this. table1 and table3 have over 1000000 rows,table2 has 100 rows. This SQL statement is very slow. It takes more than 100 seconds finish. Is there any way to improve it?

Noel
  • 10,152
  • 30
  • 45
  • 67
eeandrew
  • 89
  • 2
  • 9

3 Answers3

1

Use MERGE, it serves faster

merge into table3 tbl3
using (select tbl1.pk_1, tbl1.pk_2, tbl2.pk_3, tbl1.pk4
         from table1 tbl1, table2 tbl2
        where tbl1.pk_1 = 'root') tb1
   on (tbl3.pk_1 = tbl.pk_1 and tbl3.pk_2 = tbl.pk_2 and tbl3.pk_3 = tbl.pk_3 and tbl.pk_4 = tbl1.pk_4) 
when not matched then 
     insert (tbl3.pk_1, tbl3.pk_2, tbl3.pk_3, tbl3.pk_4)
     VALUES (tbl.pk_1, tbl.pk_2, tbl.pk_3, tbl.pk_4);
San
  • 4,508
  • 1
  • 13
  • 19
1

I found it is very hard to improve the performance for this SQL statements. I test MERGE, however it takes more than 200 seconds. Even if the performance can be improved, I guess it can only be improved like 1 or 2 seconds, Which doesn't meet our business requirement. (I have around 20 SQL statements like this one). So I reviewed our business logic, and I found that this SQL statement is not necessary. We just need a more careful algorithm to gather the new data, Make an cartesian product then insert this new data into table3. In this way, this SQL statement can be removed.And the new process only takes a few seconds! I think when we improve a SQL statement, please first make sure this SQL statement is absolutely necessary.

eeandrew
  • 89
  • 2
  • 9
0

I'm pretty sure your table expression is incomplete:

FROM
  table1 tbl1, table2 tbl2

You're (probably accidentally) producing a cartesian product between table1 and table2, with 1000000 x 100 records in memory. Your table expression should probably read:

FROM
  table1 tbl1, table2 tbl2
WHERE
  tbl1.some_column = tbl2.some_column

Or even better (to prevent such things in the future)

FROM
  table1 tbl1
JOIN
  table2 tbl2 ON tbl1.some_column = tbl2.some_column
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
  • table1 and table2 are independent. Our application requires to do a cartesian product between table1 and table2. I think the bottleneck is "NOT EXISTS" which requires a "NESTED LOOPS ANTI" search. I'm wondering it there any to do "NOT EXISTS" better. – eeandrew Mar 19 '14 at 08:29
  • OK, I understand. So how many records do you expect per update? I.e. will the `NOT EXISTS` predicate remove most of the tuples produced from the cartesian product again? – Lukas Eder Mar 19 '14 at 08:35
  • Yes exactly. Although the cartesian product millions of rows, only a few rows (around 100 ) will pass the "NOT EXISTS" condition. – eeandrew Mar 19 '14 at 08:44
  • Then I agree that `MERGE` might outperform this `INSERT .. SELECT`. – Lukas Eder Mar 19 '14 at 09:31
  • Thanks very much. On my server, MERGE works slower than INSERT .. SELECT. I found a way to bypass the "NOT EXISTS" part. It's related to our business logic so developers never think in this direction before. Now I can remove this SQL statement. Thanks again for your reply. – eeandrew Mar 19 '14 at 10:21
  • @user2878435: No problem. You can then answer your own question on Stack Overflow. This would be quite helpful to future visitors. – Lukas Eder Mar 19 '14 at 12:54