1

I have an Oracle 19c database table with "resources" that are organized hierarchically like a nested folder tree. The table contains around 2.5 million rows and the tree is up to 10 levels deep.

create table RESOURCES (
    ID_       NUMBER(10) not null constraint PK_RESOURCES primary key,
    FOLDERID_ NUMBER(10) constraint FK_PARENTFOLDER references RESOURCES
);

create index FOLDERIDINDEX on RESOURCES (FOLDERID_);

I'm using SQL recursive common table expressions (aka recursive subquery factoring) to find all descendants of some given resources.

In general, this works quite nicely, but if I try to get descendants of multiple folders with one query, some queries don't perform at all using Oracle. I'd like to understand why this is the case, and if there's some easy way to speed things up (query hint?, bugfix?, ...?)

For example, this statement does not return within 60 minutes(!):

WITH cte1 (id_) AS (SELECT id_ FROM Resources where id_ = 11
                    UNION ALL
                    SELECT r.id_ FROM Resources r, cte1 c WHERE r.folderId_ = c.id_),
     cte2 (id_) AS (SELECT id_ FROM Resources where id_ = 808965
                    UNION ALL
                    SELECT r.id_ FROM Resources r, cte2 c WHERE r.folderId_ = c.id_)
SELECT count(*)
FROM Resources r
WHERE (r.folderId_ IN (SELECT * FROM cte1) OR r.folderId_ IN (SELECT * FROM cte2));

If I replace the two sub-selects in the last line with a UNION, it just takes a few seconds:

WITH cte1 (id_) AS (SELECT id_ FROM Resources where id_ = 11
                    UNION ALL
                    SELECT r.id_ FROM Resources r, cte1 c WHERE r.folderId_ = c.id_),
     cte2 (id_) AS (SELECT id_ FROM Resources where id_ = 808965
                    UNION ALL
                    SELECT r.id_ FROM Resources r, cte2 c WHERE r.folderId_ = c.id_)
SELECT count(*)
FROM Resources r
WHERE (r.folderId_ IN (SELECT * FROM cte1 UNION SELECT * FROM cte2));

While that could already be the solution, it's a bit hard to change in my project, because SQL is auto-generated by code from application queries, and at that point not so easy to change. Also, application queries could be much more complex and such a replacement might not even be possible. These are just simplified examples. Maybe there's some other way to speed things up?

Interestingly, the slow query works without any performance problems on other databases like MySQL 8, PostgreSQL 13, SQL Server 2016 (with small syntax changes for the databases). It's just Oracle which seems to have a problem here.

This is the query plan for the first query, i.e. the slow one:

------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |              |     1 |     5 |    54G  (1)|591:38:12 |
|   1 |  SORT AGGREGATE                             |              |     1 |     5 |            |          |
|*  2 |   FILTER                                    |              |       |       |            |          |
|   3 |    TABLE ACCESS FULL                        | RESOURCES    |  2410K|    11M|  9389   (1)| 00:00:01 |
|*  4 |    VIEW                                     |              |   239K|  3046K| 23128   (1)| 00:00:01 |
|   5 |     UNION ALL (RECURSIVE WITH) BREADTH FIRST|              |       |       |            |          |
|*  6 |      INDEX UNIQUE SCAN                      | PK_RESOURCES |     1 |     6 |     2   (0)| 00:00:01 |
|*  7 |      HASH JOIN                              |              |   239K|  5623K| 23126   (1)| 00:00:01 |
|   8 |       RECURSIVE WITH PUMP                   |              |       |       |            |          |
|   9 |       BUFFER SORT (REUSE)                   |              |       |       |            |          |
|  10 |        TABLE ACCESS FULL                    | RESOURCES    |  2410K|    25M|  9386   (1)| 00:00:01 |
|* 11 |    VIEW                                     |              |   239K|  3046K| 23128   (1)| 00:00:01 |
|  12 |     UNION ALL (RECURSIVE WITH) BREADTH FIRST|              |       |       |            |          |
|* 13 |      INDEX UNIQUE SCAN                      | PK_RESOURCES |     1 |     6 |     2   (0)| 00:00:01 |
|* 14 |      HASH JOIN                              |              |   239K|  5623K| 23126   (1)| 00:00:01 |
|  15 |       RECURSIVE WITH PUMP                   |              |       |       |            |          |
|  16 |       BUFFER SORT (REUSE)                   |              |       |       |            |          |
|  17 |        TABLE ACCESS FULL                    | RESOURCES    |  2410K|    25M|  9386   (1)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
"   2 - filter( EXISTS (SELECT 0 FROM ""CTE1"" ""CTE1"" WHERE ""CTE1"".""ID_""=:B1) OR  EXISTS (SELECT 0 "
"              FROM ""CTE2"" ""CTE2"" WHERE ""CTE2"".""ID_""=:B2))"
"   4 - filter(""CTE1"".""ID_""=:B1)"
"   6 - access(""ID_""=11)"
"   7 - access(""R"".""FOLDERID_""=""C"".""ID_"")"
"  11 - filter(""CTE2"".""ID_""=:B1)"
"  13 - access(""ID_""=808965)"
"  14 - access(""R"".""FOLDERID_""=""C"".""ID_"")"

For comparison, the faster query using a UNION seems to use a better plan:

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |               |     1 |    18 |       | 55733   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE                                |               |     1 |    18 |       |            |          |
|*  2 |   HASH JOIN                                    |               |  2806K|    48M|    11M| 55733   (1)| 00:00:03 |
|   3 |    VIEW                                        | VW_NSO_1      |   479K|  6092K|       | 50820   (1)| 00:00:02 |
|   4 |     SORT UNIQUE                                |               |   479K|  6092K|  9424K| 50820   (1)| 00:00:02 |
|   5 |      UNION-ALL                                 |               |       |       |       |            |          |
|   6 |       VIEW                                     |               |   239K|  3046K|       | 23128   (1)| 00:00:01 |
|   7 |        UNION ALL (RECURSIVE WITH) BREADTH FIRST|               |       |       |       |            |          |
|*  8 |         INDEX UNIQUE SCAN                      | PK_RESOURCES  |     1 |     6 |       |     2   (0)| 00:00:01 |
|*  9 |         HASH JOIN                              |               |   239K|  5623K|       | 23126   (1)| 00:00:01 |
|  10 |          RECURSIVE WITH PUMP                   |               |       |       |       |            |          |
|  11 |          BUFFER SORT (REUSE)                   |               |       |       |       |            |          |
|  12 |           TABLE ACCESS FULL                    | RESOURCES     |  2410K|    25M|       |  9386   (1)| 00:00:01 |
|  13 |       VIEW                                     |               |   239K|  3046K|       | 23128   (1)| 00:00:01 |
|  14 |        UNION ALL (RECURSIVE WITH) BREADTH FIRST|               |       |       |       |            |          |
|* 15 |         INDEX UNIQUE SCAN                      | PK_RESOURCES  |     1 |     6 |       |     2   (0)| 00:00:01 |
|* 16 |         HASH JOIN                              |               |   239K|  5623K|       | 23126   (1)| 00:00:01 |
|  17 |          RECURSIVE WITH PUMP                   |               |       |       |       |            |          |
|  18 |          BUFFER SORT (REUSE)                   |               |       |       |       |            |          |
|  19 |           TABLE ACCESS FULL                    | RESOURCES     |  2410K|    25M|       |  9386   (1)| 00:00:01 |
|  20 |    INDEX FAST FULL SCAN                        | FOLDERIDINDEX |  2410K|    11M|       |  2392   (1)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
"   2 - access(""R"".""FOLDERID_""=""ID_"")"
"   8 - access(""ID_""=11)"
"   9 - access(""R"".""FOLDERID_""=""C"".""ID_"")"
"  15 - access(""ID_""=808965)"
"  16 - access(""R"".""FOLDERID_""=""C"".""ID_"")"
ahu
  • 1,492
  • 12
  • 11
  • 1
    You could try adding an index hint to the main query, or maybe cardinality hints for both CTE references - if those are not actually finding many rows then the 239k cardinality estimate might not be helping. – Alex Poole Dec 19 '22 at 18:34
  • Thanks @AlexPoole, but an index hint sadly didn't change anything. Cardinality hints are no option here, because the query may actually return that many rows. I still tried to play around with those, but they had no effect (maybe I specified them in a wrong way - they don't seem to be documented officially, and I used the syntax of some unofficial sources) – ahu Dec 20 '22 at 08:22

0 Answers0