0

I have a query, that I did not write, that takes 2.5 minutes to run. I am trying to optimize it without being able to modify the underlying tables, i.e. no new indexes can be added.

During my optimization troubleshooting I commented out a filter and all of a sudden my query ran in .5 seconds. I have screwed with the formatting and placing of that filter and if it is there the query takes 2.5 minutes, without it .5 seconds. The biggest problem is that the filter is not on the table that is being table-scanned (With over 300k records), it is on a table with 300 records.

The "Actual Execution Plan" of both the 0:0:0.5 vs 0:2:30 are identical down to the exact percentage costs of all steps:

Execution Plan

The only difference is that on the table-scanned table the "Actual Number of Rows" on the 2.5 min query shows 3.7 million rows. The table only has 300k rows. Where the .5 sec query shows Actual Number of Rows as 2,063. The filter is actually being placed on the FS_EDIPartner table that only has 300 rows.

With the filter I get the correct 51 records, but it takes 2.5 minutes to return. Without the filter I get duplication, so I get 2,796 rows, and only take half a second to return.

I cannot figure out why adding the filter to a table with 300 rows and a correct index is causing the Table scan of a different table to have such a significant difference in actual number of rows. I am even doing the "Table scan" table as a sub-query to filter its records down from 300k to 17k prior to doing the join. Here is the actual query in its current state, sorry the tables don't make a lot of sense, I could not reproduce this behavior in test data.

SELECT dbo.FS_ARInvoiceHeader.CustomerID
    , dbo.FS_EDIPartner.PartnerID
    , dbo.FS_ARInvoiceHeader.InvoiceNumber
    , dbo.FS_ARInvoiceHeader.InvoiceDate
    , dbo.FS_ARInvoiceHeader.InvoiceType
    , dbo.FS_ARInvoiceHeader.CONumber
    , dbo.FS_EDIPartner.InternalTransactionSetCode
    , docs.DocumentName
    , dbo.FS_ARInvoiceHeader.InvoiceStatus
FROM  dbo.FS_ARInvoiceHeader 
    INNER JOIN dbo.FS_EDIPartner ON dbo.FS_ARInvoiceHeader.CustomerID = dbo.FS_EDIPartner.CustomerID
    LEFT JOIN (Select DocumentName
                FROM GentranDatabase.dbo.ZNW_Documents
                WHERE DATEADD(SECOND,TimeCreated,'1970-1-1') > '2016-06-01'
                    AND TransactionSetID = '810') docs on  dbo.FS_ARInvoiceHeader.InvoiceNumber = docs.DocumentName COLLATE Latin1_General_BIN
WHERE docs.DocumentName IS NULL
    AND dbo.FS_ARInvoiceHeader.InvoiceType = 'I'
    AND dbo.FS_ARInvoiceHeader.InvoiceStatus <> 'Y'
    --AND (dbo.FS_EDIPartner.InternalTransactionSetCode = '810') 
    AND (NOT (dbo.FS_ARInvoiceHeader.CONumber LIKE 'CB%')) 
    AND (NOT (dbo.FS_ARInvoiceHeader.CONumber LIKE 'DM%')) 
    AND InvoiceDate > '2016-06-01'

The Commented out line in the Where statement is the culprit, uncommenting it causes the 2.5 minute run.

Earthworm
  • 116
  • 7
  • Query seems OK, did you verified you indexes and statistics. Kindly go through it's execution plan, create index where needed and re-build existing fragmented indexes and update statistics. – Rajesh Ranjan Jun 23 '16 at 15:17
  • That subquery is nonSARGable. It has to perform your date calculation for every single row before it can determine if the row matches or not. Posting the actual execution plan instead of a picture would be far more helpful. There is a lot more to tuning queries than eliminating table scans. – Sean Lange Jun 23 '16 at 15:19
  • I am actually not trying to eliminate the Table-scan, as I can't really, due to the restriction of not being able to add any indexes. I am trying to figure out why the "Actual Number of Rows" in the tables scan becomes so high when filtering a different table. Unfortunately that number for Time is stored weird, I suppose I could reverse the equation and convert '2016-06-01' to an int, than compare that for better performance, but still doesn't help the 3.7 million rows to be scanned problem. – Earthworm Jun 23 '16 at 15:23
  • Take a look at the "number of executions" data point when examining your query plan and see if that is greater than 1. Also, if this query is happening in a stored procedure see if copying records to a temp table or table variable, then adding an index is faster than going off of the raw table. – Jeremy Jun 23 '16 at 15:45
  • @Jeremy that was a data point I missed. The Number of executions of the 2.5 Minute version is 1792, where it is only 1 in the .5 second version. – Earthworm Jun 24 '16 at 12:07
  • @Jeremy, Thank you, your comment pointed out the problem was not the number of rows, but the number of Executions, this pointed me to this other StackOverflow question [link](http://stackoverflow.com/questions/8460416/query-optimizer-operator-choice-nested-loops-vs-hash-match-or-merge) which helped me understand the issue better. Yours is the **correct** answer if you would like to put it as an answer I will accept it as correct. – Earthworm Jun 27 '16 at 15:58

2 Answers2

0

It could be that the table statistics may have gotten out of whack. These include the number of records tables have which is used to choose the best query plan. Try running this and running the query again:

EXEC sp_updatestats
Rono
  • 3,171
  • 2
  • 33
  • 58
0

Using @jeremy's comment as a guideline to point out the Actual Number of Rows was not my problem, but instead the number of executions, I figured out that the Hash Match was .5 seconds, the Nested loop was 2.5 minutes. Trying to force the Hash Match using Left HASH Join was inconsistent depending on what the other filters were set to, changing dates took it from .5 seconds, to 30 secs sometimes. So forcing the Hash (Which is highly discouraged anyway) wasn't a good solution. Finally I resorted to moving the poor performing view to a Stored Procedure and splitting out both of the tables that were related to the poor performance into Table Variables, then joining those table variables. This resulted in the most consistently good performance of getting the results. On average the SP returns in less than 1 second, which is far better than the 2.5 minutes it started at.

@Jeremy gets the credit, but since his wasn't an answer, I thought I would document what was actually done in case someone else stumbles across this later.

Earthworm
  • 116
  • 7