Problem with SQL Server or SQL query OR both

Question

I have SQL query like this:

DECLARE @cdate1 date = '20200401 00:00:00'
DECLARE @cdate2 date = '20200630 23:59:59'
         
SELECT DISTINCT ([hf].[id]) 
FROM ((([hf]  
JOIN [pw] AS [PP] ON [PP].[identity] = [hf].[id] 
                  AND [PP].[type] = 67 
                  AND [PP].[ideletestate] = 0  
                  AND [PP].[datein] = (SELECT MAX([datein]) 
                                       FROM [pw] 
                                       WHERE [pw].[identity] = [hf].[id] 
                                         AND [pw].[ideletestate] = 0 
                                         AND [pw].[type] = 67))
JOIN [px] ON [px].[idpaper] = [PP].[id] 
          AND [px].[ideletestate] = 0 
          AND [px].[type] = 30036 
          AND [px].[nazvanie] NOT LIKE '')
JOIN [pw] ON ([pw].[identity] = [hf].[id] 
          AND ([pw].[id] > 0) 
          AND ([pw].[ideletestate] = 0) 
          AND ([pw].[type] IN (16, 2, 3012, 19, 3013))) 
LEFT JOIN [px] AS [px102] ON [px102].[idpaper] = [pw].[id] 
                          AND [px102].[type] = 102 
                          AND [px102].[ideletestate] = 0)  
WHERE 
    (([pw].[idcompany] in (12461, 12466,  12467, 12462, 12463, 13258)) OR      
    ([pw].[idcompany2] in (12461, 12466,  12467, 12462, 12463, 13258)) OR
    ([px102].[idcompany] in (12461, 12466,  12467, 12462, 12463, 13258)) ) AND  
[pw].[datein] >= @cdate1  AND [pw].[datein] <= @cdate2

It works fine, but if I print it like this ...AND [pw].[datein] >= '20200401 00:00:00' AND [pw].[datein] <= '20200630 23:59:59', it work very slowly. 10 minutes vs 1 sec.

One more strange, if i use first date '20200101 00:01:00' it work fast too. If date more then 10 March 2020, it work very slow (if date like string in query, if variable it work good).

Do I have a bad query? But why do it work with variable? Or is it some issue with SQL Server?

Indexes and query execution plans... You'd need a better understanding of how SQL actually plans and retrieves data to get a good understanding here. If I were you I'd look at the suggested indexes, and see if there is something that makes sense there. Also, generally avoid multiple SELECT statements and stick to JOINs, imo. — pkr, Jul 03 '20 at 13:10

allmhuran · Answer 1 · 2020-07-04T08:56:08.310

This looks like a statistics problem.

SQL server will build a histogram of the values in a table to give it some idea what kind of query plan to create.

For example, if you have a table T with a million rows in it, but the value of column C is always 1, and then you do select * from T with C = 1, the engine will choose a plan that expects to get a lot of rows returned, because the histogram says "it is statistically likely that this table contains a hell of a lot of rows where C = 1"

Alternatively, if you have a table T with a million rows in it, but the value of column C is never 1, then the histogram tells the engine "very few rows are likely to be returned for the query select * from T where C = 1 so pick a plan optimized for a small number of rows".

A problem can arise when the values in a column have significantly changed, but the histogram (statistics) have yet to be updated. Then SQL might pick a plan based on the histogram, where a different plan would have been much better. In your case, the histogram may not indicate to the engine that there are any values greater than about the 10th of March 2020. Statistics issues are fairly common with dates, because you are often inserting getdate(), which means newer rows in the table will contain values that have never been seen before, and thus won't be covered by the histogram until it gets updated.

SQL will automatically update statistics based on a number of different triggers (this is an old article, newer versions of the engine may have changed slightly), as long as you have auto update statistics enabled in the database settings.

You can find out whether this is the issue by forcing SQL to update statistics on your table. Statistics can be refreshed by either fully scanning the table, or sampling it. Sampling is much faster for large tables, but the result won't be as accurate.

To find out whether statistics is the problem in your case, do:

update statistics PW with fullscan

Thanks, i will try this. But don't understand why so difference between date as string and date as variable as date(datetime). — Sergey, Jul 04 '20 at 11:49
From what I see in your post, the values in your variables are not identical to the values you hardcoded. But if you also see this difference when the values are identical, then you probably have a cached plan for the ad hoc query. You can force SQL not to use a cached plan by adding `option (recompile)` to the end of your query. — allmhuran, Jul 04 '20 at 12:07

Problem with SQL Server or SQL query OR both

1 Answers1