0

I have a query generated by EF5 code where I'm using SqlFunctions.PatIndex and the query takes 5 minutes to run. If I make a change to the SQL to use like operator instead then it works sub-second time. From what I've read PatIndex is supposed to work at least as well as the like operator. This appears to be some interaction with the row_number() OVER portion because if you execute just the innermost select with the patindex clause it comes back in sub-second time as well. Can anyone tell me why the row_number() OVER is killing the PATINDEX query but not the one using like?

This one works:

exec sp_executesql N'SELECT TOP (10) 
[Project1].[C1] AS [C1], 
[Project1].[PersonId] AS [PersonId], 
[Project1].[FirstName] AS [FirstName], 
[Project1].[MiddleName] AS [MiddleName], 
[Project1].[LastName] AS [LastName], 
[Project1].[Suffix] AS [Suffix], 
[Project1].[DateOfBirth] AS [DateOfBirth], 
[Project1].[Ssn] AS [Ssn], 
[Project1].[CenterNumber] AS [CenterNumber], 
[Project1].[GuarantorNumber] AS [GuarantorNumber], 
[Project1].[GUNumber] AS [GUNumber], 
[Project1].[IndustrialClientNumber] AS [IndustrialClientNumber], 
[Project1].[Employer] AS [Employer]
FROM ( SELECT [Project1].[PersonId] AS [PersonId], [Project1].[CenterNumber] AS [CenterNumber], [Project1].[GuarantorNumber] AS [GuarantorNumber], [Project1].[GUNumber] AS [GUNumber], [Project1].[IndustrialClientNumber] AS [IndustrialClientNumber], [Project1].[Employer] AS [Employer], [Project1].[FirstName] AS [FirstName], [Project1].[MiddleName] AS [MiddleName], [Project1].[LastName] AS [LastName], [Project1].[Suffix] AS [Suffix], [Project1].[DateOfBirth] AS [DateOfBirth], [Project1].[Ssn] AS [Ssn], [Project1].[C1] AS [C1], row_number() OVER (ORDER BY [Project1].[FirstName] ASC) AS [row_number]
       FROM ( SELECT 
              [Extent1].[PersonId] AS [PersonId], 
              [Extent1].[CenterNumber] AS [CenterNumber], 
              [Extent1].[GuarantorNumber] AS [GuarantorNumber], 
              [Extent1].[GUNumber] AS [GUNumber], 
              [Extent1].[IndustrialClientNumber] AS [IndustrialClientNumber], 
              [Extent1].[Employer] AS [Employer], 
              [Extent2].[FirstName] AS [FirstName], 
              [Extent2].[MiddleName] AS [MiddleName], 
              [Extent2].[LastName] AS [LastName], 
              [Extent2].[Suffix] AS [Suffix], 
              [Extent2].[DateOfBirth] AS [DateOfBirth], 
              [Extent2].[Ssn] AS [Ssn], 
              ''0X0X'' AS [C1]
              FROM  [dbo].[vGuarantor] AS [Extent1]
              INNER JOIN [dbo].[vPerson] AS [Extent2] ON [Extent1].[PersonId] = [Extent2].[PersonId]
              WHERE ([Extent1].[CenterNumber] = @p__linq__0) AND ***([Extent1].[GuarantorNumber] like @p__linq__1)***
       )  AS [Project1]
)  AS [Project1]
WHERE [Project1].[row_number] > 0
ORDER BY [Project1].[FirstName] ASC
',N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',@p__linq__0=N'2',@p__linq__1=N'13100%'

This one doesn’t:

exec sp_executesql N'SELECT TOP (10) 
[Project1].[C1] AS [C1], 
[Project1].[PersonId] AS [PersonId], 
[Project1].[FirstName] AS [FirstName], 
[Project1].[MiddleName] AS [MiddleName], 
[Project1].[LastName] AS [LastName], 
[Project1].[Suffix] AS [Suffix], 
[Project1].[DateOfBirth] AS [DateOfBirth], 
[Project1].[Ssn] AS [Ssn], 
[Project1].[CenterNumber] AS [CenterNumber], 
[Project1].[GuarantorNumber] AS [GuarantorNumber], 
[Project1].[GUNumber] AS [GUNumber], 
[Project1].[IndustrialClientNumber] AS [IndustrialClientNumber], 
[Project1].[Employer] AS [Employer]
FROM ( SELECT [Project1].[PersonId] AS [PersonId], [Project1].[CenterNumber] AS [CenterNumber], [Project1].[GuarantorNumber] AS [GuarantorNumber], [Project1].[GUNumber] AS [GUNumber], [Project1].[IndustrialClientNumber] AS [IndustrialClientNumber], [Project1].[Employer] AS [Employer], [Project1].[FirstName] AS [FirstName], [Project1].[MiddleName] AS [MiddleName], [Project1].[LastName] AS [LastName], [Project1].[Suffix] AS [Suffix], [Project1].[DateOfBirth] AS [DateOfBirth], [Project1].[Ssn] AS [Ssn], [Project1].[C1] AS [C1], row_number() OVER (ORDER BY [Project1].[FirstName] ASC) AS [row_number]
       FROM ( SELECT 
              [Extent1].[PersonId] AS [PersonId], 
              [Extent1].[CenterNumber] AS [CenterNumber], 
              [Extent1].[GuarantorNumber] AS [GuarantorNumber], 
              [Extent1].[GUNumber] AS [GUNumber], 
              [Extent1].[IndustrialClientNumber] AS [IndustrialClientNumber], 
              [Extent1].[Employer] AS [Employer], 
              [Extent2].[FirstName] AS [FirstName], 
              [Extent2].[MiddleName] AS [MiddleName], 
              [Extent2].[LastName] AS [LastName], 
              [Extent2].[Suffix] AS [Suffix], 
              [Extent2].[DateOfBirth] AS [DateOfBirth], 
              [Extent2].[Ssn] AS [Ssn], 
              ''0X0X'' AS [C1]
              FROM  [dbo].[vGuarantor] AS [Extent1]
              INNER JOIN [dbo].[vPerson] AS [Extent2] ON [Extent1].[PersonId] = [Extent2].[PersonId]
              WHERE ([Extent1].[CenterNumber] = @p__linq__0) AND ***(PATINDEX(@p__linq__1, [Extent1].[GuarantorNumber]) > 0)***
       )  AS [Project1]
)  AS [Project1]
WHERE [Project1].[row_number] > 0
ORDER BY [Project1].[FirstName] ASC
',N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',@p__linq__0=N'2',@p__linq__1=N'13100%'

Here are the two tables:

/****** Object:  Table [Person]    Script Date: 5/30/2013 8:17:15 AM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [Person](
    [PersonId] [uniqueidentifier] NOT NULL,
    [FirstName] [nvarchar](255) NULL,
    [MiddleName] [nvarchar](255) NULL,
    [LastName] [nvarchar](255) NULL,
    [DateOfBirth] [datetime] NULL,
    [DateOfBirthIsGuess] [bit] NULL,
    [NickName] [nvarchar](255) NULL,
    [Suffix] [nchar](10) NULL,
    [Gender] [char](1) NULL,
    [SSN] [numeric](9, 0) NULL,
 CONSTRAINT [PK__Person__04E4BC85] PRIMARY KEY CLUSTERED 
(
    [PersonId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 95) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

/****** Object:  Index [PK__Person__04E4BC85]    Script Date: 7/18/2013 8:19:16 AM ******/
ALTER TABLE [PatientFirst].[Person] ADD  CONSTRAINT [PK__Person__04E4BC85] PRIMARY KEY CLUSTERED 
(
    [PersonId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 95) ON [PRIMARY]
GO

/****** Object:  Index [IDX_Person.LastName]    Script Date: 7/18/2013 8:19:24 AM ******/
CREATE NONCLUSTERED INDEX [IDX_Person.LastName] ON [PatientFirst].[Person]
(
    [LastName] ASC
)
INCLUDE (   [DateOfBirth],
    [FirstName],
    [MiddleName],
    [PersonId],
    [SSN],
    [Suffix]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

/****** Object:  Index [IDX_Person.FirstName]    Script Date: 7/18/2013 8:19:37 AM ******/
CREATE NONCLUSTERED INDEX [IDX_Person.FirstName] ON [PatientFirst].[Person]
(
    [FirstName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

/****** Object:  Index [IDX_Person.DOB]    Script Date: 7/18/2013 8:19:47 AM ******/
CREATE NONCLUSTERED INDEX [IDX_Person.DOB] ON [PatientFirst].[Person]
(
    [DateOfBirth] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

/****** Object:  Table [Guarantor]    Script Date: 5/30/2013 8:17:11 AM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [Guarantor](
    [GuarantorId] [uniqueidentifier] NOT NULL,
    [CenterNumber] [nvarchar](10) NOT NULL,
    [GuarantorNumber] [nvarchar](10) NOT NULL,
    [IndustrialClientNumber] [int] NULL,
    [Employer] [varchar](4000) NULL,
    [ImportDate] [datetime] NULL,
    [GUNumber]  AS (([CenterNumber]+'*')+[GuarantorNumber]),
 CONSTRAINT [PK__Guarantor__44FF419A] PRIMARY KEY CLUSTERED 
(
    [GuarantorId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 95) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [Guarantor] ADD  CONSTRAINT [DF__Guarantor__Guara__45F365D3]  DEFAULT (newid()) FOR [GuarantorId]
GO

ALTER TABLE [Guarantor] ADD  CONSTRAINT [DF_Guarantor_ImportDate]  DEFAULT (getdate()) FOR [ImportDate]
GO

ALTER TABLE [Guarantor]  WITH CHECK ADD  CONSTRAINT [FK_Guarantor_Person] FOREIGN KEY([GuarantorId])
REFERENCES [Person] ([PersonId])
GO

ALTER TABLE [Guarantor] CHECK CONSTRAINT [FK_Guarantor_Person]
GO
/****** Object:  Index [PK__Guarantor__44FF419A]    Script Date: 7/18/2013 8:17:43 AM ******/
ALTER TABLE [PatientFirst].[Guarantor] ADD  CONSTRAINT [PK__Guarantor__44FF419A] PRIMARY KEY CLUSTERED 
(
    [GuarantorId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 95) ON [PRIMARY]
GO


/****** Object:  Index [IDX_Guarantor_CenterNumber]    Script Date: 7/18/2013 8:16:40 AM ******/
CREATE NONCLUSTERED INDEX [IDX_Guarantor_CenterNumber] ON [PatientFirst].[Guarantor]
(
    [CenterNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


/****** Object:  Index [IDX_Guarantor_CenterNumberGuarantorNumber]    Script Date: 7/18/2013 8:16:45 AM ******/
CREATE UNIQUE NONCLUSTERED INDEX [IDX_Guarantor_CenterNumberGuarantorNumber] ON [PatientFirst].[Guarantor]
(
    [CenterNumber] ASC,
    [GuarantorNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 95) ON [PRIMARY]
GO


/****** Object:  Index [IDX_Guarantor_GuarantorNumber]    Script Date: 7/18/2013 8:17:01 AM ******/
CREATE NONCLUSTERED INDEX [IDX_Guarantor_GuarantorNumber] ON [PatientFirst].[Guarantor]
(
    [GuarantorNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


/****** Object:  Index [IDX_Guarantor_GUNumber]    Script Date: 7/18/2013 8:17:09 AM ******/
CREATE NONCLUSTERED INDEX [IDX_Guarantor_GUNumber] ON [PatientFirst].[Guarantor]
(
    [GUNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 95) ON [PRIMARY]
GO


/****** Object:  Index [IDX_Guarantor_ImportDate]    Script Date: 7/18/2013 8:17:21 AM ******/
CREATE NONCLUSTERED INDEX [IDX_Guarantor_ImportDate] ON [PatientFirst].[Guarantor]
(
    [ImportDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 95) ON [PRIMARY]
GO

/****** Object:  Index [IDX_Guarantor_IndustrialClientNumber]    Script Date: 7/18/2013 8:17:30 AM ******/
CREATE NONCLUSTERED INDEX [IDX_Guarantor_IndustrialClientNumber] ON [PatientFirst].[Guarantor]
(
    [IndustrialClientNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 95) ON [PRIMARY]
GO

The views used in the queries are just select * from the underlying tables.

Execution Plan for the like operator:

https://docs.google.com/file/d/0B4uS27e7ZMfCOTl5LS1UaHBSOTQ/edit?usp=sharing

Patindex plan: https://docs.google.com/file/d/0B4uS27e7ZMfCVTljNml4R0xjT28/edit?usp=sharing

BlackICE
  • 8,816
  • 3
  • 53
  • 91
  • Make a workload with the slow query, run it through the Database Engine Tuning Advisor, and see what it suggests. – Craig Stuntz May 30 '13 at 18:43
  • It's not the indexes, I've already done that, and the like operator returns fine, but patindex does not. – BlackICE May 30 '13 at 19:27
  • What's the string you're searching for? – Richard Deeming Jul 17 '13 at 18:40
  • I'm searching for "13100%" – BlackICE Jul 17 '13 at 18:54
  • In that case, the query using `LIKE` will be able to use an index seek to find the value, whereas the version using `PATINDEX` will be stuck using an index scan. – Richard Deeming Jul 17 '13 at 18:57
  • Could you try using `.StartsWith("13100")` instead? – Richard Deeming Jul 17 '13 at 19:00
  • No, that was a simple example, they do things like %3%00%, that will only translate to patindex. – BlackICE Jul 17 '13 at 19:03
  • How does the performance of the two queries compare if you search for `%3%00%`? – Richard Deeming Jul 17 '13 at 19:05
  • Would you be able to post the definition of the indexes on both tables, and the actual execution plan of the queries? – Richard Deeming Jul 17 '13 at 19:26
  • performance compare about the same as when I search for 13100%, the PATINDEX is far slower. – BlackICE Jul 18 '13 at 11:53
  • added the index definitions to the question body, in the same section with the table defs – BlackICE Jul 18 '13 at 12:21
  • Looking at the two execution plans, it's quite obvious why the performance is so different. The `Like` plan starts at the `Guarantor` table, performs an index seek with a residual predicate, and then joins to the `Person` table. The `PatIndex` plan starts at the `Person` table, joins to the `Guarantor` table, and then uses a filter to satisfy the `PatIndex` condition. What's not obvious is if there's any way to fix it. – Richard Deeming Jul 18 '13 at 13:46

1 Answers1

0

I suspect the performance of both queries for a "contains" search (eg: %3%00%) will be equally bad. However, if the majority of your searches include a prefix (eg: 131%00%), you might be able to improve the performance by adding a StartsWith condition.

For example:

var query = context.Guarantors.Where(g => g.CenterNumber == centerNumber);

int index = guarantorNumber.IndexOfAny(new[]{ '%', '_', '[' });
switch (index)
{
    case -1:
    {
        query = query.Where(g => g.GuarantorNumber == guarantorNumber);
        break;
    }
    case 0:
    {
        // Nothing we can do here. The query performance will suck.
        query = query.Where(g => SqlFunctions.PatIndex(guarantorNumber, g.GuarantorNumber) > 0);
        break;
    }
    default:
    {
        string prefix = guarantorNumber.Substring(0, index);
        query = query.Where(g => g.GuarantorNumber.StartsWith(prefix));
        if (index != guarantorNumber.Length - 1 || guarantorNumber[index] != '%')
        {
            query = query.Where(g => SqlFunctions.PatIndex(guarantorNumber, g.GuarantorNumber) > 0);
        }
        break;
    }
}

NB: You could also try updating your statistics. The execution plan is showing a significant difference between the estimated and actual rows for both tables.

  • Guarantor: Estimated 7188, actual 107345
  • Person: Estimated 7102, actual 6858694

From the huge difference, I'm guessing you have the Auto Update Statistics option turned off, and you haven't scheduled a job to manually update the statistics.

Try the following commands:

UPDATE STATISTICS Person;
UPDATE STATISTICS Guarantor;

That should give SQL enough information to choose a much saner execution plan.

UPDATE STATISTICS topic on MSDN

Richard Deeming
  • 29,830
  • 10
  • 79
  • 151
  • the patindex is considerable worse, 2:50 vs 0:37. Everything I've read indicates that patindex should be more performant, not less. – BlackICE Jul 18 '13 at 14:22
  • @BlackICE: I'm not sure where you've read that `PatIndex` would be better than `Like` - if anything, I'd expect it to be the other way round. If you look at the query plans, you'll see that the `Like` query (with a prefix) is able to read a small part of the index on the `Guarantor` table, whereas the `PatIndex` query has to read the whole thing. – Richard Deeming Jul 18 '13 at 14:29
  • @BlackICE: Also, try updating your statistics. I've edited my answer to explain. – Richard Deeming Jul 18 '13 at 14:30
  • http://stackoverflow.com/questions/8052425/sql-string-comparison-speed-like-vs-patindex http://blogs.msdn.com/b/deepakbi/archive/2012/07/25/like-vs-patindex.aspx – BlackICE Jul 18 '13 at 16:09
  • I ran the update statistics even though I was pretty sure I had run them already, little difference, but that may have been caching, I got results where like operator query returned sub second, patindex returned ~5 minutes. – BlackICE Jul 18 '13 at 16:11
  • I'm at least as interested in the part where the ROW_NUMBER OVER is the performance killer. I'm really stuck with using patindex, i'd like to know why restricting the result set with the row_number outer query makes performance worse. – BlackICE Jul 18 '13 at 16:13
  • @BlackICE: I don't think the `ROW_NUMBER` is the problem. If you clear the plan cache ([DBB FREEPROCCACHE](http://msdn.microsoft.com/en-us/library/ms174283.aspx)) and run the `Like` query with a non-prefix search term (eg: `%3%00%`), you should see the same slow performance as the `PatIndex` query. – Richard Deeming Jul 18 '13 at 17:14