2

I have read a lot about indexing and differences between them. Now i am working on query optimization in my project. I have created nonclustered index, that should be used on query execution, but that is not the case. Details below:

Table:

enter image description here

Index:

CREATE NONCLUSTERED INDEX [_IXProcedure_Deleted_Date] ON [por].[DailyAsset]
(
    [Deleted] ASC,
    [Date] DESC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Query generated by Entity Framework:

exec sp_executesql N'SELECT 
[Project1].[C1] AS [C1], 
[Project1].[AssetId] AS [AssetId], 
[Project1].[Active] AS [Active], 
[Project1].[Date] AS [Date]
FROM ( SELECT 
    [Extent1].[AssetId] AS [AssetId], 
    [Extent1].[Active] AS [Active], 
    [Extent1].[Date] AS [Date], 
    1 AS [C1]
    FROM [por].[DailyAsset] AS [Extent1]
    WHERE (0 = [Extent1].[Deleted]) AND ([Extent1].[Date] < @p__linq__0)
)  AS [Project1]
ORDER BY [Project1].[Date] DESC',N'@p__linq__0 datetime2(7)',@p__linq__0='2014-05-01 00:00:00'

Execution plan:

enter image description here

Missing index details:

The Query Processor estimates that implementing the following index could improve the query cost by 23.8027%.


CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [por].[DailyAsset] ([Deleted],[Date])
INCLUDE ([AssetId],[Active])

I am aware that with inclusion of AssetId and Active columns to index, index will be used.

Now, why it is not working without column inclusion?

This is simplified example of another queries, where all columns are fetched as a result. The only solution to (force) index seek usage, is to include all columns in index, which has the same Estimated Subtree cost (obvious).

Another annoying issue here is Sort ignorance. Date columns is in index and set to DESCENDING. It is completly ignored, and ofcourse, Sort operations takes expensive place in execution plan.

UPDATE 1:

As @Jayachandran pointed out, IndexSeek + KeyLookUp should be used in query above, but covering index is well documented, and it assumes that AssetId and Active columns should be included. I agree with that.

I am creating UPDATE 1 to demonstrate covering index usefulness in query below. Same table, larger result set. As far as i can figure it out, not a single columns should not be used in index, and index stays created for Date and Deleted columns.

exec sp_executesql N'SELECT 
[Project1].[DailyAssetId] AS [DailyAssetId], 
[Project1].[AssetId] AS [AssetId], 
[Project1].[CreatedByUserId] AS [CreatedByUserId], 
[Project1].[UpdatedByUserId] AS [UpdatedByUserId], 
[Project1].[TimeCreated] AS [TimeCreated], 
[Project1].[TimeUpdated] AS [TimeUpdated], 
[Project1].[Deleted] AS [Deleted], 
[Project1].[TimeDeleted] AS [TimeDeleted], 
[Project1].[DeletedByUserId] AS [DeletedByUserId], 
[Project1].[Active] AS [Active], 
[Project1].[Date] AS [Date], 
[Project1].[Quantity] AS [Quantity], 
[Project1].[TotalBookValue] AS [TotalBookValue], 
[Project1].[CostPrice] AS [CostPrice], 
[Project1].[CostValue] AS [CostValue], 
[Project1].[FairPrice] AS [FairPrice], 
[Project1].[FairValue] AS [FairValue], 
[Project1].[UnsettledQuantity] AS [UnsettledQuantity], 
[Project1].[UnsettledValue] AS [UnsettledValue], 
[Project1].[SettlementDate] AS [SettlementDate], 
[Project1].[EffectiveDate] AS [EffectiveDate], 
[Project1].[PortfolioId] AS [PortfolioId]
FROM ( SELECT 
    [Extent1].[DailyAssetId] AS [DailyAssetId], 
    [Extent1].[AssetId] AS [AssetId], 
    [Extent1].[CreatedByUserId] AS [CreatedByUserId], 
    [Extent1].[UpdatedByUserId] AS [UpdatedByUserId], 
    [Extent1].[TimeCreated] AS [TimeCreated], 
    [Extent1].[TimeUpdated] AS [TimeUpdated], 
    [Extent1].[Deleted] AS [Deleted], 
    [Extent1].[TimeDeleted] AS [TimeDeleted], 
    [Extent1].[DeletedByUserId] AS [DeletedByUserId], 
    [Extent1].[Active] AS [Active], 
    [Extent1].[Date] AS [Date], 
    [Extent1].[Quantity] AS [Quantity], 
    [Extent1].[TotalBookValue] AS [TotalBookValue], 
    [Extent1].[CostPrice] AS [CostPrice], 
    [Extent1].[CostValue] AS [CostValue], 
    [Extent1].[FairPrice] AS [FairPrice], 
    [Extent1].[FairValue] AS [FairValue], 
    [Extent1].[UnsettledQuantity] AS [UnsettledQuantity], 
    [Extent1].[UnsettledValue] AS [UnsettledValue], 
    [Extent1].[SettlementDate] AS [SettlementDate], 
    [Extent1].[EffectiveDate] AS [EffectiveDate], 
    [Extent1].[PortfolioId] AS [PortfolioId]
    FROM [por].[DailyAsset] AS [Extent1]
    WHERE (0 = [Extent1].[Deleted]) AND ([Extent1].[Date] < @p__linq__0)
)  AS [Project1]
ORDER BY [Project1].[Date] DESC',N'@p__linq__0 datetime2(7)',@p__linq__0='2014-05-01 00:00:00'
bajicdusko
  • 1,630
  • 1
  • 17
  • 32
  • I would guess its because the select includes the columns AssetId and Active. So using the index won't get all required data. Thus it forces the scan. Adding these columns to the index will make the select get all data with just the index seek. Once you force index seek, then the sort issue will resolve on its own. Currently your index is not used, so the sorted date in index has no relevance. – Jay Sep 08 '14 at 12:58
  • Yes, but as i wrote at the and, lets suppose that we are selecting all columns (those queries do exist in my project, on the same table) not just these three. Then, i should include all columns in index? That is not practical nor useful..i think. – bajicdusko Sep 08 '14 at 13:02
  • It should have been a index seek with a key look up without the covering index. Not sure why your execution plans assumes an index scan is cheaper than that, but for some reason it does; may be incorrect stats or something. But yes, covering index is a well documented method for query tuning. – Jay Sep 08 '14 at 13:08
  • I would swap Date and Deleted in index. – Arvo Sep 08 '14 at 13:14
  • @Jayachandran: I have edited my question with UPDATE 1. Am i right with covering index statement? – bajicdusko Sep 08 '14 at 13:22
  • @Arvo: Did that.. no change. I suppose you have suggested this, because, Deleted column is more unique than Date? – bajicdusko Sep 08 '14 at 13:23
  • @helionprime, in the UPDATE 1's case, you will need to include all the selected columns in the index. – Jay Sep 08 '14 at 13:30
  • CREATE NONCLUSTERED INDEX [] ON [por].[DailyAsset] ([Deleted],[Date]) INCLUDE ([AssetId],[Active],all the columns in select here) – Jay Sep 08 '14 at 13:30

2 Answers2

5

The difference with a scan and a seek (with key lookup) in this case is because of the number of the rows being returned. The volume is too great and so the optimizer has chosen a cheaper plan - just scan the whole table. This will be faster than using the NC index.

Imagine if you had forced it to use the NC index and it had to do a key lookup for 40% of the rows in the table. This would be like a foreach loop executing many times. So SQL has chosen to just scan the table because it would be faster than the loop.

Regarding your question about how to account for other columns that might be included in other queries, there are really a couple of choices. You could create a covering index that includes the most commonly used columns or you could change the primary key to orient it towards your most common access path. i.e. by date, deleted and an identity column for uniqueness.

On another note, using a guid for a primary key causes all sorts of problems with your clustered index and all other indexes (because the key for the PK will be included in all other indexes). The random ordering of guids causes rows to be inserted in random order in pages. Because the index is ordered, pages must constantly be split in order to account for new rows. It would be far better to create an index that naturally increments and this would likely help the problem above as well, depending on the types of queries that are written.

Mark Wojciechowicz
  • 4,287
  • 1
  • 17
  • 25
  • In combination with @ZoffDino's comment, everything clears up for me. – bajicdusko Sep 08 '14 at 13:42
  • For the sake of discussion, about GUID as PK you mentioned. SQL server introduced NEWSEQUENTIALID() in 2005 version. It should solve problems you mentioned in comment, or.. ? – bajicdusko Sep 08 '14 at 13:52
  • yes, this will greatly reduce the problem provided the function is always run from the same computer. A windows restart would cause it to reset, but that should be so infrequent as not to matter. I would also consider the size of the PK since it will be part of all of the other indexes. The smaller footprint that you can make, the smaller the row size, the more rows per page, the fewer pages. This could have a significant impact on performance. – Mark Wojciechowicz Sep 08 '14 at 14:12
2

The ideal index for a specific query is one where (1) all the fields in the WHERE clause are in the index, and (2) all the fields in the SELECT clause are included in the index. If (1) is not met, SQL Server will weight the cost of accessing multiple indexes and pick the one it think is fastest; if (2) is not met, that means an expensive Key Lookup operation. SQL Server may think it's worth it if the index has very high selectivity (few duplicate values).

In your case, condition (2) is clearly not met. SQL Server thinks that the Key Lookup operation is too expensive compared to a clustered index scan so it chose the later. You can force SQL Server to use a particular index, but I don't know how to do that with Entity Framework.

If this query must be speedy for you, create the index as SQL Server says.

Code Different
  • 90,614
  • 16
  • 144
  • 163
  • I have edited index as SQL Server says and profiler displays same duration of execution (on current data set) with much less "reads" (360 reads with my index and 66 reads with index SQL Server suggested. Yours and Mark's answer helped me a lot. – bajicdusko Sep 08 '14 at 13:45