0

We have an application that executes some queries we cannot change, something like this (I used StackOverflow2013 database to demonstrate it):

SELECT *
FROM dbo.Posts p
WHERE CHARINDEX(N'Aptana', p.Title) > 0

Our DB has a similar structure - row is very wide, consists of a lot of differenet columns, include nvarchar(smth) and nvarchar(max) data types.

This query has this query plan (as ours, clustered index scan), obviously:

  |--Clustered Index Scan(OBJECT:([StackOverflow2013].[dbo].[Posts].[PK_Posts_Id] AS [p]), WHERE:(charindex(N'Aptana',[StackOverflow2013].[dbo].[Posts].[Title] as [p].[Title])>(0))) 

actual execution plan with clustered index scan

We have an index on this column, and I created one on dbo.Posts (Title):

CREATE INDEX myPleasureSort ON dbo.Posts (Title);

I cannot change query, but I can create indexes and use plan guides to add INDEX HINTs. I have to say, that our users always use this kind of queries to find just a few rows, maybe 100 out of 50 millions, so nonclustered index scan should be faster and less resource intensive.

So when I try this:

SELECT *
FROM dbo.Posts p
WHERE CHARINDEX(N'Aptana', p.Title) > 0
OPTION (MAXDOP 1, TABLE HINT(p, INDEX (myPleasureSort)))

It results in this:

|--Nested Loops(Inner Join, OUTER REFERENCES:([p].[Id], [Expr1002]) WITH UNORDERED PREFETCH)
   |--Sort(ORDER BY:([p].[Id] ASC))
   |    |--Index Scan(OBJECT:([StackOverflow2013].[dbo].[Posts].[myPleasureSort] AS [p]),  WHERE:(charindex(N'Aptana',[StackOverflow2013].[dbo].[Posts].[Title] as [p].[Title])>(0)))
   |--Clustered Index Seek(OBJECT:([StackOverflow2013].[dbo].[Posts].[PK_Posts_Id] AS [p]), SEEK:([p].[Id]=[StackOverflow2013].[dbo].[Posts].[Id] as [p].[Id]) LOOKUP ORDERED FORWARD)

actual execution plan with key lookup and sort

And this is my question. Why do I have this sort operation before Key Lookup? I think because of it I have a huge memory grant and I don't want it in production.

The query memory grant detected "ExcessiveGrant", which may impact the reliability. Grant size: Initial 566496 KB, Final 566496 KB, Used 216 KB.

I found workaround with this index:

CREATE INDEX myPleasure ON dbo.Posts (Id, Title);

And for this query I have next query plan:

SELECT *
FROM dbo.Posts p
WHERE CHARINDEX(N'Aptana', p.Title) > 0
OPTION (MAXDOP 1, TABLE HINT(p, INDEX (myPleasure)))

  |--Nested Loops(Inner Join, OUTER REFERENCES:([p].[Id], [Expr1002]) WITH UNORDERED PREFETCH)
       |--Index Scan(OBJECT:([StackOverflow2013].[dbo].[Posts].[myPleasure] AS [p]),  WHERE:(charindex(N'Aptana',[StackOverflow2013].[dbo].[Posts].[Title] as [p].[Title])>(0)) ORDERED FORWARD)
       |--Clustered Index Seek(OBJECT:([StackOverflow2013].[dbo].[Posts].[PK_Posts_Id] AS [p]), SEEK:([p].[Id]=[StackOverflow2013].[dbo].[Posts].[Id] as [p].[Id]) LOOKUP ORDERED FORWARD)

actual execution plan with key lookup without sort

But I would prefer to use Index just on nvarchar column, to have a possibility to use it with something like LIKE 'str%'.

Thank you in advance, and please execuse my poor English.

UPDATE: SELECT @@VERSION:

Microsoft SQL Server 2017 (RTM-CU20) (KB4541283) - 14.0.3294.2 (X64)

UPDATE 2: Thanks to @MattM, it looks like my case: Why is Sort operation before Nested Loops (Inner Join)?

  • 2
    `WHERE CHARINDEX(N'Aptana', p.Title) > 0` isn't SARGable, so the index isn't going to help. Therefore SQL Server feels the easiest way is scan your entire Clustered Index (which likely will be the fastest as it's going to be a covering index)/ – Thom A Jun 24 '20 at 08:13
  • Thanks, I understand that. But I want SQL server to scan nonclustered index on that column, because it much smaller than clustered index and I know for sure that it will return only a few rows – Oleg Aronov Jun 24 '20 at 08:29
  • But you're using a `SELECT *`, so a column that includes every column is going to (likely) be far more efficient. Otherwise the instance will need to scan said non-clustered index, and then perform a key-lookup; and that is exactly what it's doing when you force the index. Let the Data Engine make the decisions on how to get the data, it knows what it's doing. – Thom A Jun 24 '20 at 08:31
  • I cannot change the query and it contains SELECT *, but nonclustered index scan with key lookup still much faster than clustered index scan. – Oleg Aronov Jun 24 '20 at 08:34
  • *You* might know that there's only a few rows, but SQL Server, doesn't. If you want it to be able to use appropriate indexes, fix the query; make it SARGable. – Thom A Jun 24 '20 at 08:35
  • I want, but I cannot and that's why I want to add an index hint with plan guide. But it has strange side-effect with sort operator before Key Lookup. – Oleg Aronov Jun 24 '20 at 08:38

1 Answers1

0

SQL Server really does, by and large, have your best interests at heart. If it's performing a clustered index scan or using a sort operator, it's probably because the query optimiser has found the best plan it could in the time it had to search for one.

The memory grant is what SQL Server determines the query will need to complete -- this includes the memory needed to read all the data. An excessive memory grant means SQL Server has used way less memory than the optimiser thought the query needed.

The query optimiser estimates this amount of memory based on, amongst other things, the potential size of the data to be returned.

(The optimiser would have accounted for the Sort operator in your query plan. But I suspect that is not the issue here.)

My guess is a number of columns in your SELECT * statement are large objects -- VARCHAR and/or NVARCHAR data types of either 100+ or MAX data length -- that do not use their entire length.

As an example, let's suppose the [Title] column is an NVARHCAR(255), and most values in the column are less than 50 characters in length.

The query optimiser can't see the data in this column. Just the data type. It has to assume that any or all of the data in that column, for all the rows it's expecting to return, could be the full 255 characters long.

It will, therefore, request enough memory to accommodate this column. (It probably won't get everything it asks for, because SQL Server's not a complete idiot. SQL Server will allocate a revised memory grant, but it will still be sizeable.)

When it turns out that most data in the [Title] column was less than 50 characters long, SQL Server will complain about the fact it had to clear out all that memory just for this dumb query, which didn't end up using it all anyway!

Adjusting the data types for any of these large object columns to narrower lengths will mean the query optimiser requests less memory.

But, in your case, the SELECT * and non-SARGable WHERE clause will probably still make the query run sub-optimally.

Really, the only way to improve performance on your query is to rewrite it so it's SARGable.

E.g.

SELECT  *
FROM    dbo.Posts AS p
WHERE   p.Title LIKE '%Aptana%' ;

This might even use your index without the need for a query hint.

MattM
  • 314
  • 2
  • 10
  • Thank you for your detailed answer, @MattM. I understand that my WHERE clause isn't SARGable and SQL Server thinks that it will return 33% of rows and prefer to scan clustered index. Also I understand that it reserves memory using estimated rows number and row size. But I don't understand why it uses SORT operator when I use Index hint and SQL Server performs Key Lookup. – Oleg Aronov Jun 24 '20 at 12:15
  • Not sure. This answer indicates SQL Server sorts by the clustered key column to optimise key lookups and the nested loop join: https://stackoverflow.com/questions/45360126/why-is-sort-operation-before-nested-loops-inner-join – MattM Jun 24 '20 at 22:14