SELECT COUNT query with many left JOIN clause is taking too long

Question

This is the query to count the rows only

SELECT COUNT(*)
FROM INTERFACES.EAI_DJA_PAGU  A
LEFT JOIN INTERFACES.EAI_DJA_SATKER B ON B.KDSATKER = A.KDSATKER 
LEFT JOIN INTERFACES.EAI_DJA_UNIT C ON C.KDUNIT = A.KDUNIT 
LEFT JOIN INTERFACES.EAI_DJA_PROGRAM D ON D.KDPROGRAM = A.KDPROGRAM 
LEFT JOIN INTERFACES.EAI_DJA_KEGIATAN E ON E.KDGIAT = A.KDGIAT 
LEFT JOIN INTERFACES.EAI_DJA_OUTPUT F ON F.KDOUTPUT = A.KDOUTPUT

EAI_DJA_PAGU have 3867 rows. EAI_DJA_SATKER have 1000 rows. EAI_DJA_UNIT have 305 rows. EAI_DJA_PROGRAM have 596 rows. EAI_DJA_KEGIATAN have 3280 rows. EAI_DJA_OUTPUT have 5000 rows.

And the query took like 13 minutes and 30 seconds to finish and returns roughly 9 billion of rows.

I have already created indexes for all the joined column as well as the record id as it is auto created on table creation as the primary key. Here are all the indexes create query

CREATE INDEX INTERFACES.EAI_DJA_PAGU_KDSATKER_IDX ON INTERFACES.EAI_DJA_PAGU (KDSATKER);
CREATE INDEX INTERFACES.EAI_DJA_SATKER_KDSATKER_IDX ON INTERFACES.EAI_DJA_SATKER (KDSATKER);
CREATE INDEX INTERFACES.EAI_DJA_UNIT_KDUNIT_IDX ON INTERFACES.EAI_DJA_UNIT (KDUNIT);
CREATE INDEX INTERFACES.EAI_DJA_PROGRAM_KDPROGRAM_IDX ON INTERFACES.EAI_DJA_PROGRAM (KDPROGRAM);
CREATE INDEX INTERFACES.EAI_DJA_KEGIATAN_KDGIAT_IDX ON INTERFACES.EAI_DJA_KEGIATAN (KDGIAT);
CREATE INDEX INTERFACES.EAI_DJA_OUTPUT_KDOUTPUT_IDX ON INTERFACES.EAI_DJA_OUTPUT (KDOUTPUT);

I have also tried to run all these queries

CALL DBMS_STATS.gather_table_stats('INTERFACES', 'EAI_DJA_PAGU');
CALL DBMS_STATS.gather_table_stats('INTERFACES', 'EAI_DJA_SATKER');
CALL DBMS_STATS.gather_table_stats('INTERFACES', 'EAI_DJA_UNIT');
CALL DBMS_STATS.gather_table_stats('INTERFACES', 'EAI_DJA_PROGRAM');
CALL DBMS_STATS.gather_table_stats('INTERFACES', 'EAI_DJA_KEGIATAN');
CALL DBMS_STATS.gather_table_stats('INTERFACES', 'EAI_DJA_OUTPUT');

But after all that, the query still took a lot of time to return the row counts. Is there any way out of this situation or optimization that we may do on this query maybe? Or the query just too much??

Here are the execution plan of the query

Predicate Information
------------------------------------------------
     3 - access: ("C"."KDUNIT" = "A"."KDUNIT") (0.043)
     5 - access: ("D"."KDPROGRAM" = "A"."KDPROGRAM") (0.009)
     7 - access: ("E"."KDGIAT" = "A"."KDGIAT") (0.000)
     9 - access: ("B"."KDSATKER" = "A"."KDSATKER") (0.001)
     13 - access: ("F"."KDOUTPUT" = "A"."KDOUTPUT") (0.014)

Roughly 9 **billion** rows, not *million*. I guess it does take time to count that many rows. EAI_DJA_PAGU has only one index, on KDSATKER column. What about all other columns involved in joins? You created indexes on columns that belong to other (outer joined) tables, but - why didn't you do that for EAI_DJA_PAGU as well? — Littlefoot, Aug 16 '23 at 08:22
Okay, I will try to create the rest of the indexes for EAI_DJA_PAGU as well and give you the update — Raymond Natio, Aug 16 '23 at 08:37
Please post an actual execution plan for the query. Otherwise it's hard to tell what is actually going on inside the database. And while you don't know what is done, you cannot say how to optimize it — astentx, Aug 16 '23 at 08:43
Hi @astentx, I have added the execution plan details in the post — Raymond Natio, Aug 16 '23 at 09:13
you may find better info on where you're loosing time if you look at a sqlmonitor of the query. Depending on your version, you may find different methods to get it (and it requires a license). That being said, I suspect the indexes may not help that much (unless they are much smaller than the tables), and you don't have any predicate restricting the result set of any table. If you produce a sqlmonitor, it will tell also if you're loosing too much time in the hash join parts of your query. — gsalem, Aug 16 '23 at 09:50
This has nothing to do with execution plans or stats or indexes. The row count is the key piece of information here, indicating a many-to-many-to-many join scenario that is getting itself tied up writing huge hash inputs to temp space. See my answer for further details. — Paul W, Aug 16 '23 at 13:05

Paul W · Accepted Answer · 2023-08-16T13:08:24.493

This is what happens when your join clauses define a many-to-many join to one or more of your tables. Every join should use a key that is unique on at least one side of the join. If a join key is non-unique on both sides of a join, it will create a partial Cartesian product, which has the potentially to vastly explode the number of rows in your result set. I say "partial" because the execution plan will not show you that there is a Cartesian join, but one is actually happening within each group of join key values. That is clearly what is happening to give you this many rows. In fact if you ever get more result rows than the largest table involved, you are guaranteed to have a many-to-many somewhere.

A many-to-many which balloons an intermediate resultset in turn causes any subsequent hash joins to be very slow: the amount of I/O to temporary tablespace for building these hash join inputs on such a massive result set can take a very long time. Solve the excessive row problem and you solve the performance problem.

So, go through each of your tables and test for uniqueness. For example:

Test first join:

SELECT KDSATKER,COUNT(*)
  FROM INTERFACES.EAI_DJA_SATKER
 GROUP BY KDSATKER
HAVING COUNT(*) > 1

SELECT KDSATKER,COUNT(*)
  FROM INTERFACES.EAI_DJA_PAGU 
 GROUP BY KDSATKER
HAVING COUNT(*) > 1

Test second join:

SELECT KDUNIT,COUNT(*)
  FROM INTERFACES.EAI_DJA_UNIT
 GROUP BY KDUNIT
HAVING COUNT(*) > 1

SELECT KDUNIT,COUNT(*)
  FROM INTERFACES.EAI_DJA_PAGU 
 GROUP BY KDUNIT
HAVING COUNT(*) > 1

and so forth with each of your joins. If any of these tests gives you results for both queries, you have a many-to-many join. If that's the case with several tables, you have a many-to-many-to-many-to-many set of joins and that can easily produce billions of rows.

To fix, determine what additional columns are needed in each join to describe uniqueness on one or the other side of the join. Once every one of your joins are one-to-one and only one of them is one-to-many and there are none remaining that are many-to-many, it will resolve your performance issue.

There is however another way to get a many-to-many join that is less obvious. If you have multiple one-to-many joins, then the product of those multiple joins against themselves is a many-to-many. For example:

join1:
rowA-->1
rowA-->2
rowA-->3

gives you three rows.

join2:
rowA-->M
rowA-->N
rowA-->O

also gives you three rows. But there are three rows from join2 for every driving row which already includes 3 rows from join1. So you will end up with 9 rows:

rowA-->1-->M
rowA-->1-->N
rowA-->1-->O
rowA-->2-->M
rowA-->2-->N
rowA-->2-->O
rowA-->3-->M
rowA-->3-->N
rowA-->3-->O

The solution to this is to only have at most one one-to-many join in the same query block. If you need multiple tables with one-to-many relationships in the same query, you probabably need to use inline views (inner query blocks in the FROM clause) that do GROUP BYs on the join keys so that they are unique when joining at the parent block level. Example:

SELECT A.*,
       B.column_I_need1,
       C.column_I_need2
  FROM INTERFACES.EAI_DJA_PAGU A
       LEFT JOIN (SELECT KDSATKER,
                         MAX(column_I_need) column_I_need1
                    FROM INTERFACES.EAI_DJA_SATKER
                   GROUP BY KDSATKER) B ON A.KDSATKER = B.KDSATKER
       LEFT JOIN (SELECT KDUNIT,
                         MAX(column_I_need) column_I_need2
                    FROM INTERFACES.EAI_DJA_UNIT
                   GROUP BY KDUNIT) C ON A.KDUNIT = C.KDUNIT

Thank you very much for your help! @PaulW – Raymond Natio Aug 21 '23 at 03:42 — Raymond Natio, Aug 21 '23 at 03:42

SELECT COUNT query with many left JOIN clause is taking too long

1 Answers1