1

I have a table with hierarchical data (contains ~5.000.000 records). I use the following query to retrieve all children records of a specific record:

with TEMP_PACKAGES as 
(
    select ID AS PACKAGE_ID,PARENT_PACKAGE_ID,1 as LEVEL  from PACKAGES where PACKAGES.ID = 5405988
    union all
    select A.ID AS PACKAGE_ID, B.PARENT_PACKAGE_ID,B.LEVEL+1 AS LEVEL 
    from PACKAGES as A  join TEMP_PACKAGES B on (A.PARENT_PACKAGE_ID = B.PACKAGE_ID)
)
select PACKAGE_ID,PARENT_PACKAGE_ID,LEVEL from TEMP_PACKAGES

so far so good, the above query executed instantly (0 ms).

Now, when I add one more field (name:RESERVED) on the select, the query execution time goes from 0ms to 15000ms (15") (!):

with TEMP_PACKAGES as 
(
    select ID AS PACKAGE_ID,PARENT_PACKAGE_ID,RESERVED,1 as LEVEL  from PACKAGES where PACKAGES.ID = 5405988
    union all
    select A.ID AS PACKAGE_ID, B.PARENT_PACKAGE_ID,A.RESERVED,B.LEVEL+1 AS LEVEL  
    from PACKAGES as A  join TEMP_PACKAGES B on (A.PARENT_PACKAGE_ID = B.PACKAGE_ID)
)
select PACKAGE_ID,PARENT_PACKAGE_ID,RESERVED,LEVEL from TEMP_PACKAGES

Note that:

  1. All the appropriate indexes exists (ID,PARENT_PACKAGE_ID)
  2. The type of RESERVED field is bit(NULL)

Any ideas why this happening?

Also note that if I modify the query as this:

with TEMP_PACKAGES as 
(
    select ID AS PACKAGE_ID,PARENT_PACKAGE_ID,1 as LEVEL  from PACKAGES where PACKAGES.ID = 5405988
    union all
    select A.ID AS PACKAGE_ID, B.PARENT_PACKAGE_ID,B.LEVEL+1 AS LEVEL  
    from PACKAGES as A  join TEMP_PACKAGES B on (A.PARENT_PACKAGE_ID = B.PACKAGE_ID)
)
select  P.ID,P.PARENT_PACKAGE_ID,P.RESERVED,TP.LEVEL
from    TEMP_PACKAGES as TP join PACKAGES as P on TP.PACKAGE_ID=P.ID

the performance is also instantly (0ms), as the first query.

Update (2022.04.13)

thank you for your answers. I attached both execution plans (fast query & slow query) as many of you have requested.

Also, the SQL server edition is 2008 64bit (SP3).

Execution plans image

  • Aliases like `a` is for `Customer` and `b` is for `Account`, or `t1` is for `Purchase` and `t2` is for `Transaction` aren't helpful for you or others that want to read your code. Use meaningful and *consistent* aliases for your objects. I suggest a read of [Bad Habits to Kick : Using table aliases like (a, b, c) or (t1, t2, t3)](https://sqlblog.org/2009/10/08/bad-habits-to-kick-using-table-aliases-like-a-b-c-or-t1-t2-t3). `P` for Packages, and `TP` for Temp_packages would make far more sense here. – Thom A Apr 12 '22 at 11:46
  • 1
    Execution plans tell us how any given query is being resolved by the optimizer. Without the plan, I'm only guessing. Probably though, you have an index as you say, adding the RESERVED column likely causes a key or rowid lookup (along with an additional join), OR, shifts a seek to a scan. No way to be sure without the execution plan. – Grant Fritchey Apr 12 '22 at 12:13
  • 1
    For performance help, we need to see table and index definitions, and share the fast and slow query plans via https://brentozar.com/pastetheplan. This question is not answerable otherwise – Charlieface Apr 12 '22 at 12:40
  • If you use SSMS -- Microsoft's SQL Server Management Studio -- this tip is for you: Right-click in a query window, then select Show Actual Execution Plan, then run the query. The execution plan display sometimes recommends a new index to create. – O. Jones Apr 12 '22 at 14:08
  • You didn't say which SQL Server version you're using. Bit columns can sometimes trigger unexpected behavior. As mentioned, you should inspect the query plan. Also, using name RESERVED for a column is not a good idea, because now or in the future, it might be a reserved key word. Having said that, it would be interesting to know if it makes a difference if you change `RESERVED` to `CAST("RESERVED" AS tinyint) AS Rsvrd`. – Gert-Jan Apr 12 '22 at 16:28
  • Please share the plans via https://brentozar.com/pastetheplan. An image is no good, we can't see any of the detailed info for each operator. Please also show table *and index* definitions – Charlieface Apr 13 '22 at 13:31
  • Looks like the index `PACKAGES_IDX2` does not include the column `RESERVED`, you need to add that as an `INCLUDE` column – Charlieface Apr 13 '22 at 13:32

1 Answers1

0

You should include the Reserved column as part of the index on ID column. Before adding the reserved column, you query only used the index and did not touch the table for any I/O.

As soon as you added reserved column, every CTE iteration needed to look up the reserved value from the table using ID.

If you cover the reserved column with an index on ID, you will get the performance you seek.

See MS documentation on this here

DhruvJoshi
  • 17,041
  • 6
  • 41
  • 60
  • Thank you, I believe that you are right about that without the RESERVED field the query only uses information from the indexes. But this doesn't explain why the #3 query is fast (the #3 performs a join with the same table (PACKAGES), outside the CTE) – Dimitris Stoufis Apr 18 '22 at 06:43
  • @DimitrisStoufis because CTE has recursion and its hitting the same table again and again. When done outside it already has a small list of ids for which it hist index , and then gets the data using seek operation – DhruvJoshi Apr 18 '22 at 08:41