Optimizing Oracle CONNECT BY when used with WHERE clause

Question

Oracle START WITH ... CONNECT BY clause is applied before applying WHERE condition in the same query. Thus, WHERE constraints won't help optimize CONNECT BY.

For example, the following query will likely perform full table scan (ignoring selectivity on dept_id):

SELECT * FROM employees 
WHERE dept_id = 'SALE'
START WITH manager_id is null
CONNECT BY PRIOR employee_id = manager_id

I tried to improve performance in 2 ways:

query A:

SELECT * FROM employees 
START WITH manager_id is null AND dept_id = 'SALE'
CONNECT BY PRIOR employee_id = manager_id

query B:

SELECT * FROM (
               SELECT * FROM employees 
                WHERE dept_id = 'SALE'
              )
START WITH manager_id is null
CONNECT BY PRIOR employee_id = manager_id

While both queries did much better than original, on Oracle 10g Release 2, query B did performed much better than A.

Did you have similar performance optimization to deal with with respect to CONNECT BY and WHERE clauses? How would you explain query B doing much better than query A?

Show the explain plans, maybe even the trace files and what kind of indexes do you have? Anyway I think you yourself have already explained why B is faster, B is faster because the where is evaluated first. — tuinstoel, Jul 04 '09 at 05:42
null values are not placed in indexes so perhaps you can replace what null means with 0 or -1 or some such to allow the index to work. — Martlark, Jul 04 '09 at 09:39
@tuinstoel - yes, it looks that reducing data set that CONNECT BY operates on is better optimization than reducing initial set of START WITH. I was thinking more towards various optimization approaches: I tried to play with constraints inside of CONNECT BY but this wasn't successful. — topchef, Jul 04 '09 at 16:10

Tony Andrews · Accepted Answer · 2009-07-04T12:21:47.090

17

Query A says start with managers in the Sales department and then get all their employees. Oracle doesn't "know" that all the employees returned be the query will be in the Sales department, so it can't use that information to reduce the set of data to work with before performing the CONNECT BY.

Query B explicitly reduces the set of data to be worked on to just those employees in Sales, which Oracle can then do before performing the CONNECT BY.

edited Jul 04 '09 at 12:21

answered Jul 04 '09 at 10:04

Tony Andrews

129,880
21
220
259

please note that this answer only explain why B is better than A, but the solution C provided by @jluu did much better for me – Christophe Blin Jan 05 '17 at 13:30
1

The answer for A being faster than the original lies in the fact that Oracle evaluate clauses in this order : FROM (which include JOINs), CONNECT BY (which include START WITH), WHERE, GROUP BY, HAVING, SELECT, ORDER BY – Jean-Philippe Martin Jun 20 '17 at 22:12

score 7 · Answer 2 · edited Dec 27 '11 at 12:23

7

This should give the ultimate performance:

CREATE INDEX i_employees_employee_manager_dept ON employees (employee_id,manager_id,dept_id);
CREATE INDEX i_employees_manager_employee_dept ON employees (manager_id,employee_id,dept_id);

SELECT * FROM employees  
START WITH manager_id is null AND dept_id = 'SALE' 
CONNECT BY PRIOR employee_id = manager_id AND dept_id = 'SALE'

Note that you do need both index and both AND conditions for the optimization to work.

edited Dec 27 '11 at 12:23

Aliaksei Kliuchnikau

13,589
4
59
72

answered Dec 27 '11 at 09:25

jluu

71
1
1

Can we rewrite it as `from employees join dual on dept_id = 'SALE' ` to avoid having to repeat this filter twice ? – Johan Boulé Jul 13 '21 at 21:24

score 2 · Answer 3 · answered Dec 03 '13 at 21:24

This is a similar query, long story short it worked faster using the nested sql than the double connect by prior option.

'SELECT level, XMLElement("elemento", XMLAttributes(codigo_funcion as "Codigo",
                                                    nombre_funcion as "Nombre",
                                                    objetivos as "Objetivos",
                                                     descripcion as "Descripción",
                                                    ''rightHanging'' as "layout"))
   FROM (
           SELECT * FROM dithe_codigo_funcion 
           WHERE nodo_raiz = ''PEP''
    )      
   START WITH codigo_funcion = ''PEP'' 
   CONNECT BY PRIOR codigo_funcion = nivel_anterior';

So my recommendation without much expertise is to use the nested sql to filter.

score 1 · Answer 4 · answered Jul 04 '09 at 11:39

1

What are the indexes on employees? You better have an index on employeeid. And you likely do have one as a consequence of declaring employeeid as the primary key.

You might get better performance with an index on managerid as well. Try it. This has to be balanced against slower performance when inserting new employees or reorganizing managing relationships.

answered Jul 04 '09 at 11:39

Walter Mitty

18,205
2
28
58

Walter, all indexes are there. Thanks. – topchef Jul 04 '09 at 15:03

Optimizing Oracle CONNECT BY when used with WHERE clause

4 Answers4

Linked