1

I am using SQL Server 2016 Enterprise Edition and saw some curious behavior in regards to parameterizing a LIKE predicate.

Here is some sample code to demonstrate the issue. The CUSTLABEL field has an index on it. The unparameterized last query has a significantly lower cost:

declare @custlabel_wildcard varchar(50) = 'Patient%'
declare @custlabel_no_wildcard varchar(50) = 'Patient'

-- Cost .23
select count(*) from GR_CUST_FIELD_DATA where CUSTLABEL like @custlabel_wildcard
-- Cost .23
select count(*) from GR_CUST_FIELD_DATA where CUSTLABEL like @custlabel_no_wildcard + '%'
-- Cost .067
select count(*) from GR_CUST_FIELD_DATA where CUSTLABEL like 'Patient%'

The plan for the first query is here: https://www.brentozar.com/pastetheplan/?id=BkzUb6OEd

The plan for the last query is here: https://www.brentozar.com/pastetheplan/?id=H1r9b6OVu

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
fthurber
  • 11
  • 1

1 Answers1

2

The difference is just due to different cardinality estimates in the parameterised case.

In the case of the literal it can tell at compile time that it needs to do a seek predicate on the range >'PatienSþ' and < 'PatienU' (these are calculated based on the literal and your collation) and produces pretty accurate cardinality estimates for that seek.

enter image description here

In the parameterised case it defers working out the exact index range to be seeked until runtime. It has a compute scalar that returns

  • [Expr1006] = Scalar Operator(LikeRangeStart([@custlabel_wildcard])),
  • [Expr1007] = Scalar Operator(LikeRangeEnd([@custlabel_wildcard])),
  • [Expr1008] = Scalar Operator(LikeRangeInfo([@custlabel_wildcard]))

The values Expr1006 and Expr1007 are used in the index seek operation instead. This additional level of abstraction means that it falls back to guesses on the cardinality of the predicate.

enter image description here

The seek now is costed as though it will need to read 45,883 rows (and the stream aggregate as though it needs to process that many). The reality is that it still reads and returns the same number of rows as the first plan though so this is an over costing.

You might wonder how the above works in the case a parameter value starting with a wildcard is passed to @custlabel. In that case the dynamically calculated start and end of the seek range would cover the entire domain of non null values that can be stored in that column (the seek would then read all rows where CUSTLABEL IS NOT NULL and the residual predicate on the LIKE expression would retain the ones that actually match).

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Would it be safe to say then that this over costing is a chimera; all the queries would have the same actual running time, and it would be pointless convert my code to use the literal as in the last example? – fthurber Mar 24 '21 at 17:37
  • @fthurber - yes. There is a trivial cost associated with the dynamic seek apparatus but if you compare runtime performance this would be pretty near unnoticeable. If you convert the code to use the literal you would also have to pay a compilation cost for every unique literal value that you are passing – Martin Smith Mar 24 '21 at 17:43