0

Similar questions have been asked earlier, and I did perform my research; e.g. here, here, here and here.

However what we saw recently was quite boggling. We have a very complex query for some of our Invoice reconciliation that we were trying to optimise:

WITH cte1 AS
(
    SELECT a.* 
    FROM APInvoices a WITH (NOLOCK)
    JOIN Profiles p WITH (NOLOCK) ON a.ProfileNumber = p.ProfileNumber 
                                  AND Round(a.Balance, 2) <> 0 
),
cte2 AS
(
    SELECT
        a.InvoiceID AS ID, a.OriginalInvoiceDetailID, a.ARInvoiceID, 'S' AS Type, 'I' AS Department, 
        a.APTransactionType, a.InvoiceNumber, a.BranchNumber, a.ProfileNumber, 
        a.InvoiceDate, a.Description, a.PayableAccount AS AccountNumber, a.InvoiceAmount AS InvoiceAmount, a.Balance AS Amount, 
        a.CurrencyCode, ISNULL(a.ExchangeRate, 1) AS ExchangeRate, --Mike: Can we trap for that, i.e. case arinvoiceblaances.exchangerate = Null then 1 else exchangerate? 
        ISNULL(d.TicketNumber, '')  AS TicketNumber,
        a.ProfileName AS FullName, 
        a.CashSelected, a.ChequeSelected, a.OnHold, a.DueDate, 
        Profiles.ProfileType, 
        ISNULL(d.PassengerName, '') AS PassengerName, 
        d.TransactionDate, 
        d.TravelDate, d.ReturnDate, d.VatOnCommission AS VatOnCommission, d.ReservationID, 
        d.ChainCode, d.ProductCode, d.SubmitTo, d.VoucherStatus, 
        d.InvoiceDetailID, d.CommissionAmount, d.GrossAmount, d.GroupCode, d.PstOnCommission, 
        d.TransactionType, d.CityCode, d.TransactionCode
    FROM 
        APInvoices a WITH (NOLOCK)
    JOIN 
        Profiles WITH (NOLOCK) ON a.ProfileNumber = Profiles.ProfileNumber 
    JOIN 
        ARInvoiceDetails d WITH (NOLOCK) ON d.InvoiceID = a.ARInvoiceID 
    WHERE
        d.TransactionType IN (1, 3)
        AND ((d.InvoiceDetailID = a.ARInvoiceDetailID)
             OR
             (a.ARInvoiceDetailID = 0
              AND a.ARInvoiceID = d.InvoiceID
              AND a.TicketNumber <> '''' 
              AND a.TicketNumber IS NOT NULL 
              AND a.TicketNumber = d.TicketNumber
              AND a.CurrencyCode = d.CurrencyCode
              AND (
                    CASE WHEN (LEN(a.ProfileNumber) = 8 AND a.ProfileNumber = d.VendorNumber) OR LEN(a.ProfileNumber) = 6
                        THEN 1
                        ELSE 0 
                    END) = 1
            )
            OR
            (
                a.ARInvoiceDetailID = 0
                AND a.ARInvoiceID = d.InvoiceID
                AND (a.TicketNumber = '''' OR a.TicketNumber IS NULL)
                AND a.CurrencyCode = d.CurrencyCode 
                AND (
                    CASE WHEN (LEN(a.ProfileNumber) = 8 AND a.ProfileNumber = d.VendorNumber) OR LEN(a.ProfileNumber) = 6
                        THEN 1
                        ELSE 0
                    END) = 1
            )
        )
    
    UNION

    SELECT  a.InvoiceID AS ID, a.OriginalInvoiceDetailID, a.ARInvoiceID, 'S' AS Type, 'I' AS Department,
            a.APTransactionType, a.InvoiceNumber, a.BranchNumber, a.ProfileNumber,
            a.InvoiceDate, a.Description, a.PayableAccount AS AccountNumber, a.InvoiceAmount AS InvoiceAmount, a.Balance AS Amount, 
            a.CurrencyCode, ISNULL(a.ExchangeRate, 1) AS ExchangeRate, --Mike: Can we trap for that, i.e. case arinvoiceblaances.exchangerate = Null then 1 else exchangerate? 
            ISNULL(a.TicketNumber, '')  AS TicketNumber,
            a.ProfileName AS FullName, 
            a.CashSelected, a.ChequeSelected, a.OnHold, 
            a.DueDate, 
                
            Profiles.ProfileType,

            '' AS PassengerName, 
            NULL as TransactionDate, 
            NULL as TravelDate, NULL as ReturnDate, NULL as VatOnCommission, 0 as ReservationID,
            '' as ChainCode, NULL as ProductCode, NULL as SubmitTo, NULL as VoucherStatus,
            NULL as InvoiceDetailID, NULL as CommissionAmount, NULL as GrossAmount, '' as GroupCode, NULL as PstOnCommission, 

            NULL as TransactionType,
            '' as CityCode,
            '' as TransactionCode

    FROM APInvoices a WITH (NOLOCK)
        JOIN Profiles WITH (NOLOCK) ON a.ProfileNumber = Profiles.ProfileNumber 
    WHERE  a.arinvoiceid = 0
) 
SELECT DISTINCT TOP (2) t2.* 
FROM cte2 t2
WHERE  (t2.ProfileType = 2 OR t2.ProfileType = 3) 
     AND ((t2.SubmitTo = 2 AND t2.VoucherStatus = 0) OR t2.SubmitTo <> 2)  AND Len(t2.profilenumber) = 8  AND t2.ProfileType = 2  AND t2.TicketNumber = '12345';

The execution time for the above query was well over 30 seconds (the default Sql timeout in .Net), when it produced hardly 0-2 result rows.

Trying to optimise the above query, all we accidently did was to introduce variables:

declare @profilenumberlength varchar(10)
declare @profileType int
declare @ticketNum varchar(10)
set @profilenumberlength = 8
set @profileType = 2
set @ticketNum = '12345';

WITH cte1 AS
(
       SELECT a.*
       FROM APInvoices a WITH (NOLOCK)
             JOIN Profiles p WITH (NOLOCK) ON a.ProfileNumber = p.ProfileNumber
        AND Round(a.Balance, 2) <> 0
),
cte2 AS
(
       SELECT a.InvoiceID AS ID, a.OriginalInvoiceDetailID, a.ARInvoiceID, 'S' AS Type, 'I' AS Department,
                    a.APTransactionType, a.InvoiceNumber, a.BranchNumber, a.ProfileNumber,
                    a.InvoiceDate, a.Description, a.PayableAccount AS AccountNumber, a.InvoiceAmount AS InvoiceAmount, a.Balance AS Amount,
                    a.CurrencyCode, ISNULL(a.ExchangeRate, 1) AS ExchangeRate, --Mike: Can we trap for that, i.e. case arinvoiceblaances.exchangerate = Null then 1 else exchangerate?
                    ISNULL(d.TicketNumber, '')  AS TicketNumber,
                    a.ProfileName AS FullName,
                    a.CashSelected, a.ChequeSelected, a.OnHold,
                    a.DueDate,

                    Profiles.ProfileType,

                    Isnull(d.PassengerName, '') AS PassengerName,
                    d.TransactionDate,
                    d.TravelDate, d.ReturnDate, d.VatOnCommission AS VatOnCommission, d.ReservationID,
                    d.ChainCode, d.ProductCode, d.SubmitTo, d.VoucherStatus,
                    d.InvoiceDetailID, d.CommissionAmount, d.GrossAmount, d.GroupCode, d.PstOnCommission,

                    d.TransactionType,
                    d.CityCode,
            d.TransactionCode

       FROM APInvoices a WITH (NOLOCK)
             JOIN Profiles WITH (NOLOCK) ON a.ProfileNumber = Profiles.ProfileNumber
             JOIN ARInvoiceDetails d WITH (NOLOCK) ON d.InvoiceID = a.ARInvoiceID

       WHERE
             d.TransactionType IN (1, 3)
             AND
             (
                    (d.InvoiceDetailID = a.ARInvoiceDetailID)
                    OR
                    (
                           a.ARInvoiceDetailID = 0
                           AND a.ARInvoiceID = d.InvoiceID
                           AND a.TicketNumber <> '''' AND a.TicketNumber IS NOT NULL AND a.TicketNumber = d.TicketNumber
                           AND a.CurrencyCode = d.CurrencyCode
                           AND (
                                 CASE WHEN (LEN(a.ProfileNumber) = 8 AND a.ProfileNumber = d.VendorNumber) OR LEN(a.ProfileNumber) = 6
                                        THEN 1
                                        ELSE 0
                                 END) = 1
                    )
                    OR
                    (
                           a.ARInvoiceDetailID = 0
                           AND a.ARInvoiceID = d.InvoiceID
                           AND (a.TicketNumber = '''' OR a.TicketNumber IS NULL)
                           AND a.CurrencyCode = d.CurrencyCode
                           AND (
                                 CASE WHEN (LEN(a.ProfileNumber) = 8 AND a.ProfileNumber = d.VendorNumber) OR LEN(a.ProfileNumber) = 6
                                        THEN 1
                                        ELSE 0
                                 END) = 1
                    )
             )

       UNION

       SELECT a.InvoiceID AS ID, a.OriginalInvoiceDetailID, a.ARInvoiceID, 'S' AS Type, 'I' AS Department,
                    a.APTransactionType, a.InvoiceNumber, a.BranchNumber, a.ProfileNumber,
                    a.InvoiceDate, a.Description, a.PayableAccount AS AccountNumber, a.InvoiceAmount AS InvoiceAmount, a.Balance AS Amount,
                    a.CurrencyCode, ISNULL(a.ExchangeRate, 1) AS ExchangeRate, --Mike: Can we trap for that, i.e. case arinvoiceblaances.exchangerate = Null then 1 else exchangerate?
                    ISNULL(a.TicketNumber, '')  AS TicketNumber,
                    a.ProfileName AS FullName,
                    a.CashSelected, a.ChequeSelected, a.OnHold,
                    a.DueDate,

                    Profiles.ProfileType,

                    '' AS PassengerName,
                    NULL as TransactionDate,
                    NULL as TravelDate, NULL as ReturnDate, NULL as VatOnCommission, 0 as ReservationID,
                    '' as ChainCode, NULL as ProductCode, NULL as SubmitTo, NULL as VoucherStatus,
                    NULL as InvoiceDetailID, NULL as CommissionAmount, NULL as GrossAmount, '' as GroupCode, NULL as PstOnCommission,

                    NULL as TransactionType,
                    '' as CityCode,
            '' as TransactionCode

       FROM APInvoices a WITH (NOLOCK)
             JOIN Profiles WITH (NOLOCK) ON a.ProfileNumber = Profiles.ProfileNumber
       WHERE  a.arinvoiceid = 0
)

SELECT DISTINCT TOP (2) t2.*
FROM cte2 t2
WHERE  (t2.ProfileType = 2 OR t2.ProfileType = 3)
        AND ((t2.SubmitTo = 2 AND t2.VoucherStatus = 0) OR t2.SubmitTo <> 2)  AND Len(t2.profilenumber) = @profileNumberLength  AND t2.ProfileType = @profileType  AND t2.TicketNumber = @ticketNum;

There is absolutely no change in the query logic itself, just that inline constant values in WHERE clause are replaced by variables declared before-hand. This returned in like < 1 second. SQL Server Management Studio showed an execution time of 00:00:00.

If I add a "OPTION (RECOMPILE)" hint to the second query above (which was performing way better), leaving the variables intact, the performance degrades to the same as that of the first query. The above performance results were consistent over hundreds of executions in a loop for each query variant.

Now I do understand from the literature SQL Server re-creates the execution plans for inline constant values while cached plans are used if the query contained variables. I also understand in some cases, inline constant values can generate better plans as the Query Optimiser can guess the number of rows better upfront if it has actual values during plan generation.

What baffles me is the time taken to generate the Execution plan itself. Is it normal for SQL Server to take 30+ seconds in generating an execution plan only? Please note the queries executed on a production strength server, 32 cores, 70 gig RAM (dedicated to SQL Server, there's more RAM for OS and other apps), with CPU utilisation hovering between 0-3% at the time tests were run (during late night hours).

3 tables in the query contained between 1 - 1.2 million rows each, all other participating tables hardly had a few hundred. The database in question itself is around 10 gigs.

Charlieface
  • 52,284
  • 6
  • 19
  • 43
r_honey
  • 883
  • 4
  • 15
  • 31
  • So you are seeing exactly the same execution plan both times? Can you upload both to https://brentozar.com/pastetheplan. Please also add to your post the definitions of tables and indexes – Charlieface Jan 02 '22 at 09:07
  • 2
    Queries for finacial data where, presumably, you want accurate results, should not be littered with [Nolock](https://www.brentozar.com/archive/2021/11/nolock-is-bad-and-you-probably-shouldnt-use-it/) – Stu Jan 02 '22 at 09:45
  • Just to bang tor `NOLOCK` drum harder: [Bad habits : Putting NOLOCK everywhere](https://www.sentryone.com/blog/aaronbertrand/bad-habits-nolock-everywhere) – Thom A Jan 02 '22 at 11:02
  • Also [Bad Habits to Kick : Using table aliases like (a, b, c) or (t1, t2, t3)](https://sqlblog.org/2009/10/08/bad-habits-to-kick-using-table-aliases-like-a-b-c-or-t1-t2-t3) – Thom A Jan 02 '22 at 11:04
  • Thanks @Larnu for the tips. As for NOLOCK, did you notice this query was being played around to arrive at a better performing query. It contains a CTE (cte1) which is not referenced in the query itself. We were trying to optimise and play with parts. And yes, the first iteration did not had NOLOCK, it was added later to see if it made a difference. Further when the tests were being run, the system was practically under no load, no DML operations were running. For table aliases too, the actual prod queries we write are cleaned up after the fiddling around is done. Thanks again though!! – r_honey Jan 02 '22 at 13:46
  • Thanks @Stu for the feedback. As per my comment above, we started with regular SELECT statements adding NOLOCK later to see if it made a difference. – r_honey Jan 02 '22 at 13:47
  • Hi @Charlieface, lemme fetch and post what you mentioned. I will do that and confirm again. – r_honey Jan 02 '22 at 13:47
  • 2
    `NOLOCK` is not a "go-faster" switch, it's a "give-incorrect-results" switch, it has huge implications for data integrity. If you want to avoid locking issues, use `SNAPSHOT`, or to increase performance just use full table locks `WITH (TABLOCK)` (obviously at the cost of concurrency which is not a concern if there is no DML). It makes no sense to use `NOLOCK` for that, because if there is concurrent DML you have integrity issues, and if there isn't, `TABLOCK` will suffice. These are all side issues to the real thing we need: execution plans and index definitions – Charlieface Jan 02 '22 at 13:56
  • @Charlieface basically said it, r_honey, `NOLOCK` is a performance hint. Yes, it's likely to avoid deadlocks but it does so at the expense of accuracy, sometimes severely so. If you really do "need" `NOLOCK` against all your tables (doubtful in this scenario) you should be changing the isolation level of the entire transaction. – Thom A Jan 02 '22 at 14:03
  • Thanks @Charlieface and Larnu. I would take your points on NOLOCK. As regards the data, here are the relevant files: Query 1 (https://www.rahulsingla.com/files/Query%201%20(Slow).zip), Query 2 (https://www.rahulsingla.com/files/Query%202%20(Fast).zip), Schema (https://www.rahulsingla.com/files/Schema.sql). The zip files contains Client Statistics, Live Query Statistics and Actual Execution Plan for both the queries. – r_honey Jan 02 '22 at 14:14
  • Please do as asked in the first comment which is to upload execution plan xml to PasteThePlan and add other info eg table structures, indexes and any other info to your question as scripted *create* statements – Stu Jan 02 '22 at 14:31
  • Sorry not opening random zip files from random websites. Please share plans via PasteThePlan, please [edit] your post and add *relevant* schema definitions – Charlieface Jan 02 '22 at 14:42
  • The slow plan you uploaded doesn't show any sign that it took a long time to compile. CompileTime was only 38ms. Execution time was 34.665 seconds though, it does a ton of key lookups as estimated number of rows and actual number of rows is massively different. The fast plan may well just be faster due to luck that it happens to have the operators in a different arrangement that happens to narrow things down to 0 rows much earlier – Martin Smith Jan 02 '22 at 15:12
  • 1
    Your slow query has a bad plan, try updating statistics on the tables involved or rebuilding the clustered indexes on APInvoices and ARInvoiceDetails. – Stu Jan 02 '22 at 15:44
  • 2
    Even in the fast plan the predicate on the first table accessed is unsargable and it has to do a scan (`isnull([TSTDEV].[dbo].[ARInvoiceDetails].[TicketNumber] as [d].[TicketNumber],'')=[@ticketNum]`) - you should remove the `ISNULL(a.TicketNumber, '') AS TicketNumber` and just return `TicketNumber` so the predicate on that column is seekable. This may encourage the slow plan to evaluate that first as it happens to do by luck in the faster plan – Martin Smith Jan 02 '22 at 15:50
  • Thanks @Stu for the tip. I updated statistics on all 4 tables (Profiles, APInvoices, ARInvoices, and ARInvoiceDetails) using "UPDATE STATISTICS TableName WITH FULLSCAN, COLUMNS", and it made no impact to the performance of the slow query. Kept taking 31-40 seconds. Dropped and re-created clustered index on all 4 tables, no improvement :'( – r_honey Jan 03 '22 at 07:46
  • @MartinSmith removed the TicketNumber ISNULL check "after" performing statistics update and clustered index re-creation. The time came down to 26-27 seconds in 4-5 tests (stabilised at 26 seconds afterwards). Still it is no-where near <1 second consistently delivered by query with variables. I am trying to get to the bottom of this as I would really love to understand how just variable introduction can improve a query so drastically and consistently. – r_honey Jan 03 '22 at 07:50
  • @Charlieface really appreciate your help. There were multiple files per query I wanted to share, hence zip seemed the safest option. Would you like these files to be shared in some sort of drive? In any case, here are the plans: https://www.brentozar.com/pastetheplan/?id=BkL3i7x2F (slow), and https://www.brentozar.com/pastetheplan/?id=BJ1Z27l2Y – r_honey Jan 03 '22 at 07:55
  • 1
    Looks like you need to sort your indexing out: you have some pretty big key lookups in both those plans. The fast plan is only fast because it goes to `ARInvoiceDetail` first, it could still do with a lot of improvement. Do you need `UNION` rather than `UNION ALL`, do you need `DISTINCT`? Those also slow the query down. You also have redundant code: eg `a.ARInvoiceID = d.InvoiceID` is specified multiple times – Charlieface Jan 03 '22 at 10:51
  • @Charlieface Most of the query I inherited and I am trying to clean up everything you mentioned in co-ordination with business depending upon the feedback I get, ensuring nothing breaks down as a result of my changes. I was particularly intrigued by the impact on the execution time in the 2 scenarios considering the queries were an exact replica other than variable vs inline constant in WHERE clause. Looks like this is more of Query Optimiser internals that is hard to figure out from outside and other than optimising the query itself,figuring out why plans are different isn't going to be easy. – r_honey Jan 04 '22 at 03:23
  • This is [well documented](https://www.erikdarlingdata.com/sql-server/yet-another-post-about-local-variables/): using local variables is equivalent to putting `OPTION (OPTIMIZE FOR UNKNOWN)` on the query, and means parameter sniffing is disabled. The optimizer will choose a plan that optimizes for the *average* case, and means that often it will pick a plan that expects a higher number of rows. In your case it's gone parallel to deal with the huge mass of key lookups. Effective in isolation, but inefficient if there are other things going on with the server. .... – Charlieface Jan 04 '22 at 03:30
  • .... You *need* to get rid of those key lookups with better indexing. You need to change indexes to "cover" the query, by using `INCLUDE` columns. You haven't shown the full table and index schemas, so can;t advise further. I'm also intrigued by the fact it appears one whole side of your `UNION` has disappeared, indicating the compiler found a logical impossibility in it resulting in zero rows. It appears that `((t2.SubmitTo = 2 AND t2.VoucherStatus = 0) OR t2.SubmitTo <> 2)` is doing that, because `SubmitTo` and `VoucherStatus` are null in the second half of the union. – Charlieface Jan 04 '22 at 03:32

0 Answers0