9

I would expect these two SELECTs to have the same execution plan and performance. Since there is a leading wildcard on the LIKE, I expect an index scan. When I run this and look at the plans, the first SELECT behaves as expected (with a scan). But the second SELECT plan shows an index seek, and runs 20 times faster.

Code:

-- Uses index scan, as expected:
SELECT 1
    FROM AccountAction
    WHERE AccountNumber LIKE '%441025586401'

-- Uses index seek somehow, and runs much faster:
declare @empty VARCHAR(30) = ''
SELECT 1
    FROM AccountAction
    WHERE AccountNumber LIKE '%441025586401' + @empty

Question:

How does SQL Server use an index seek when the pattern starts with a wildcard?

Bonus question:

Why does concatenating an empty string change/improve the execution plan?

Details:

  • There is a non-clustered index on Accounts.AccountNumber
  • There are other indexes, but both the seek and the scan are on this index.
  • The Accounts.AccountNumber column is a nullable varchar(30)
  • The server is SQL Server 2012

Table and index definitions:

CREATE TABLE [updatable].[AccountAction](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [AccountNumber] [varchar](30) NULL,
    [Utility] [varchar](9) NOT NULL,
    [SomeData1] [varchar](10) NOT NULL,
    [SomeData2] [varchar](200) NULL,
    [SomeData3] [money] NULL,
    --...
    [Created] [datetime] NULL,
 CONSTRAINT [PK_Account] PRIMARY KEY NONCLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]


CREATE NONCLUSTERED INDEX [IX_updatable_AccountAction_AccountNumber_UtilityCode_ActionTypeCd] ON [updatable].[AccountAction]
(
    [AccountNumber] ASC,
    [Utility] ASC
)
INCLUDE ([SomeData1], [SomeData2], [SomeData3]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]


CREATE CLUSTERED INDEX [CIX_Account] ON [updatable].[AccountAction]
(
    [Created] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

NOTE: Here is the actual execution plan for the two queries. The names of the objects differ slightly from the code above because I was trying to keep the question simple.

The two execution plans.

blackboxlogic
  • 557
  • 6
  • 13
  • 1
    is there difference in actual execution or only in estimation plans? – Roman Pekar Aug 05 '13 at 19:58
  • 3
    @GordonLinoff SQL Server 2012's version number is 11, 2008 R2: 10.5, 2008: 10, etc. – swasheck Aug 05 '13 at 20:00
  • I don't know how important it could be, but the queries that you actually ran were `LIKE '%441025586401%'`, with a wildcard at the beggining and end – Lamak Aug 05 '13 at 20:03
  • 2
    Could be parameter sniffing and the estimated plan might not realize how bad that seek might actually be. What happens if you use `OPTION (RECOMPILE)` on the query (or clear out the plan cache)? – Aaron Bertrand Aug 05 '13 at 20:04
  • @RomanPekar These are acutal execution plans. – blackboxlogic Aug 05 '13 at 20:05
  • You've not posted the code for the index that's actually being used. Looks like it's seeking on an index on the AccountAction (an index named IX_updateable_Accou...). This isn't any of the indexes that you've posted nor is it on the table – swasheck Aug 05 '13 at 20:06
  • @Lamak I took trailing wildcards out for simplicity but the results are unchanged by the trailing wildcard. – blackboxlogic Aug 05 '13 at 20:06
  • 3
    Also please be sure to measure `SET STATISTICS TIME ON;` and `SET STATISTICS IO ON;` so that you can see whether the seek or scan is actually better. Do *not* rely solely on the 96% and 4% figures to give you any indication about actual performance - these numbers are largely meaningless on their own. – Aaron Bertrand Aug 05 '13 at 20:06
  • Also the table in use is AccountAction or Account? Help us match the DDL you included with the plans. Could you post the actual execition plans somewhere? – Aaron Bertrand Aug 05 '13 at 20:07
  • I've tried these queries on my SQL 2008 R2. If I see execution plan, it shows that second one SHOULD run faster, like 5% to 95%. Actual execution is much closer, but one with @empty variable still faster... – Roman Pekar Aug 05 '13 at 20:08
  • 1
    My hunch is that in your attempts to obfuscate you've left out some important details that would indicate why you get a seek. – swasheck Aug 05 '13 at 20:08
  • @swasheck, no, it's as simple as that - I've just created table, populated it with rows and these two queries shows different plans – Roman Pekar Aug 05 '13 at 20:10
  • @swasheck . . . Thank you for the version information. That's why I asked. – Gordon Linoff Aug 05 '13 at 20:14
  • @GordonLinoff no worries. TMYK. – swasheck Aug 05 '13 at 20:15
  • @RomanPekar did you run both queries with `OPTION (RECOMPILE)`? – swasheck Aug 05 '13 at 20:16
  • @swasheck it doesn't matter. Actually, I see now that one with empty variableruns longer, I think i was mistaken. – Roman Pekar Aug 05 '13 at 20:18
  • @pants I was wondering if you would post the actual queries. Perhaps even post a link on SQLFiddle with your structure and some sample data. – swasheck Aug 05 '13 at 20:24
  • With swasheck's OPTION(RECOMPILE) and AaronBertrand's STATISTICS TIME and STATISTICS IO on, it changes to using index scans for both queries. Their 'costs' are both 50% of the batch and their cpu times are very close. – blackboxlogic Aug 05 '13 at 20:27
  • 1
    pants: actually @AaronBertrand recommended that first :D. So look into the parameter sniffing thing. – swasheck Aug 05 '13 at 20:31

1 Answers1

9

These tests (database AdventureWorks2008R2) shows what happens:

SET NOCOUNT ON;
SET STATISTICS IO ON;

PRINT 'Test #1';
SELECT  p.BusinessEntityID, p.LastName
FROM    Person.Person p
WHERE   p.LastName LIKE '%be%';

PRINT 'Test #2';
DECLARE @Pattern NVARCHAR(50);
SET @Pattern=N'%be%';
SELECT  p.BusinessEntityID, p.LastName
FROM    Person.Person p
WHERE   p.LastName LIKE @Pattern;

SET STATISTICS IO OFF;
SET NOCOUNT OFF;

Results:

Test #1
Table 'Person'. Scan count 1, logical reads 106
Test #2
Table 'Person'. Scan count 1, logical reads 106

The results from SET STATISTICS IO shows that LIO are the same. But the execution plans are quite different: enter image description here

In the first test, SQL Server uses an Index Scan explicit but in the second test SQL Server uses an Index Seek which is an Index Seek - range scan. In the last case SQL Server uses a Compute Scalar operator to generate these values

[Expr1005] = Scalar Operator(LikeRangeStart([@Pattern])), 
[Expr1006] = Scalar Operator(LikeRangeEnd([@Pattern])), 
[Expr1007] = Scalar Operator(LikeRangeInfo([@Pattern]))

and, the Index Seek operator use an Seek Predicate (optimized) for a range scan (LastName > LikeRangeStart AND LastName < LikeRangeEnd) plus another unoptimized Predicate (LastName LIKE @pattern).

How can LIKE '%...' seek on an index?

My answer: it isn't a "real" Index Seek. It's a Index Seek - range scan which, in this case, has the same performance like Index Scan.

Please see, also, the difference between Index Seek and Index Scan (similar debate): So…is it a Seek or a Scan?.

Edit 1: The execution plan for OPTION(RECOMPILE) (see Aaron's recommendation please) shows, also, an Index Scan (instead of Index Seek): enter image description here

Bogdan Sahlean
  • 19,233
  • 3
  • 42
  • 57