index on persisted computed column slower than index on non-persisted computed column

Question

we have a big table with a varchar(max) column containing elements that are 'inspected' with charindex. For instance:

select x from y where charindex('string',[varchar_max_field]) > 0

In order to speed that up, I created a computed column with the result of the charindex command. As a test, I created both a persisted and a non-persisted version of that colum and created a nc-index for each, containing only the computed column:

CREATE TABLE [schema].[table] (
  [other fields...]
  [State]  [NVARCHAR](MAX) NULL,  /* Contains JSON information */
  [NonPersistedColumn]  AS (CHARINDEX('something',[State],(1))),
  [PersistedColumn]     AS (CHARINDEX('something',[State],(1))) PERSISTED )

CREATE NONCLUSTERED INDEX [ix_NonPersistedColumn] ON [schema].[table]
    ([NonPersistedColumn]  ASC )
CREATE NONCLUSTERED INDEX [ix_PersistedColumn]    ON [schema].[table]
    ([PersistedColumn]     ASC )

Next,

SELECT TOP (50) [NonPersistedColumn]  FROM [table]  WHERE [NonPersistedColumn]  > 0

uses an index seek on the index for the non-persisted column, as expected. However,

SELECT TOP (50) [PersistedColumn]  FROM [table]  WHERE [PersistedColumn]  > 0

uses the index of the non-persisted column (equal charindex logic, so ok) and performs an identical index seek.

If I force it to use the index on the persisted column, it reverts to a Key Lookup on the clusterd index with table's ID column as a seek predicate and the [State] column in the query plan Output List. I am only asking for the column in the nc index, it is not a covering query.

Why is it using the PK index (containing an ID column)? Is that related to the PK always being added to the nc index? In this way, there is little advantage in persisting the computed column, or am I missing something?

Links to the the plans (persisted and non-persisted, respectively): https://www.brentozar.com/pastetheplan/?id=S1zoLmEEs https://www.brentozar.com/pastetheplan/?id=S1CHwmE4j

Link to the plan for the query on the persisted column without the index hint (uses the index on the non-persisted column) https://www.brentozar.com/pastetheplan/?id=HJB6j7EVs

Please upload query plans via https://pastetheplan.com. I cannot replicate your issue using https://dbfiddle.uk/66n8U0Ob. Do you have any other columns you are selecting? — Charlieface, Oct 24 '22 at 14:42
uploaded plans above. No other columns are selected (as a test) — WillemG, Oct 24 '22 at 14:57
There is no advantage in persisting indexed columns if you don't have to. You have to (as a special case) if the column expression is nondeterministic, or if it's a floating-point type. Otherwise, the only thing persisting achieves is that the data is stored twice, once in the clustered index and once more in your index. Older versions of SQL Server (and I'm talking really old; at least 2000 or maybe even earlier) required explicit persisting for an index to even be allowed; from at least 2005 onwards this is no longer necessary since indexing implicitly persists the value in the index. — Jeroen Mostert, Oct 24 '22 at 15:07
I have repro'd it now https://dbfiddle.uk/7jxUhxoR. Seems it's this issue https://dba.stackexchange.com/a/18729/220697, and you can fix it simply by removing the other non-persisted column. Problem is basically that the computed column is expanded, then matched back, and it's matching back to the wrong column as the definition is identical. — Charlieface, Oct 24 '22 at 15:14
Note that you could instead use a filtered index `CREATE NONCLUSTERED INDEX [ix_Filtered] ON [schema].[table] (SomeColsHere) WHERE (CHARINDEX(...) > 0)` — Charlieface, Oct 24 '22 at 15:15
Ok, makes sense, thank you both for responding quickly en knowledgeably! I thought persisting could be more efficient in not having to calculate on the fly, but storing the values in the index does indeed remove that advantage. — WillemG, Oct 24 '22 at 15:18

index on persisted computed column slower than index on non-persisted computed column

0 Answers0