0

We have a self-referenced table like this

CREATE TABLE Categories(
Id int IDENTITY(1,1) NOT NULL,
Title nvarchar(200) NOT NULL,
ParentId int NULL,
CONSTRAINT PK_Structures PRIMARY KEY CLUSTERED 
(
    Id ASC
)
CREATE NONCLUSTERED INDEX IX_Structures_ParentId ON Categories
(
    ParentId ASC
)

And a recursive cte to get all ancestors:

Create View Ancestors
as
with A(Id, ParentId) as 
(
    select Id, Id from Categories 
    union all
    select e.Id, p.ParentId from Categories e 
    join A p on e.ParentId = p.Id
)
select * from A

Now we query all ancestors of a given Category like:

select * from Ancestors where Id = 1234

It takes 11 seconds for a table just containing 100000 categories, and the execution plan isExecution Plan. The query returns 5 rows for the given Id

I know I can greatly improve the performance by using hierarchyid, also I know that sometimes using while can be more performant, but in a simple case like this, I expect to see a much better performance. Also, please note that I already have an index on ParentId

(The picture shows the table structure which is the actual name of Category table mentioned in the question.

Is there a tuning to greatly improve this performance?

Alireza
  • 5,421
  • 5
  • 34
  • 67
  • Looking at your Query Plan, SQL server is spending most of it's time in your Index Seek. You should drill down and see why it takes so long to find the one row with Id = 1234. Likely so poorly performing indices. – Aron Nov 26 '15 at 07:57
  • In theory, it should be able to find that row using primary key, get its parentId, find the parent row, get its parentId, find the grandparent row,... until it reaches the root. But this is not happening. – Alireza Nov 26 '15 at 08:09
  • That is what it is doing, its just taking a long time doing the first part. Now try figure out WHY. I really think there is something wrong with your index. – Aron Nov 26 '15 at 08:24
  • In addition to what "@Vladimir" said,you can also create index on view on id column.Or don't use view at all. – KumarHarsh Nov 26 '15 at 11:15

2 Answers2

2

Well. It turns out the reason for the slowness, and the fix are far more interesting than anticipated.

Sql server optimizes the queries based on their definition and not by what semantic meaning they might have. The view in question starts with all Categories and adds new rows by finding elements from the CTE itself and their children. Now the way to find all rows in which some row has appeared as a child, you need to calculate the whole query and then filter it out. Only the human reader understands that the query calculates all the descendants of any Category, which of course also has all Ancestors of any Category. Then you know you can start from bottom and find parents recursively. This is not apparent from the query definition, only from its semantic meaning.

Rewriting the view as follows will make it fast:

Create View Ancestors
as
with A(Id, ParentId) as 
(
    select Id, Id from Categories 
    union all
    select p.Id, e.ParentId from Categories e 
    join A p on e.Id = p.ParentId
)
select * from A

This view creates almost the same result as the view in question. The only difference is that it also shows null as an ancestor for all Categories, which makes no difference for our usage.

This view starts to build the hierarchy from bottom and goes up, which is compatible with the way we intend to query it.

Alireza
  • 5,421
  • 5
  • 34
  • 67
0

How does the execution plan look like if you put the filter condition inside the CTE?

with A(Id, ParentId) as 
(
    select Id, Id 
    from Categories 
    WHERE Categories.ID = 1234

    union all

    select e.Id, p.ParentId 
    from Categories e 
    join A p on e.ParentId = p.Id
)
select * 
from A;
Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90