0

I have two tables - Table A and Table B.

Table A has 121,903 rows. Table B has only 95 rows.

I need to join Table A with Table B such that I will get first row of Table B which have matching rows with Table A order by sort criteria.

I am using the following query to get the results. It is returning results correctly but has performance issues.

;WITH [TableAB] AS
(
    SELECT * FROM #TableA A
    OUTER APPLY
    (
        SELECT TOP 1 *  FROM #TableB
        WHERE 
            ([Col1] = A.[Col1]OR [Col1]IS NULL)
        AND ([Col2] = A.[Col2]OR [Col2]IS NULL)
        AND ([Col3] = A.[Col3]OR [Col3]IS NULL)
        AND ([Col4] = A.[Col4]OR [Col4]IS NULL)
        AND ([Col5] = A.[Col5] OR [Col5] IS NULL)
        AND ([Col6] = A.[Col6]OR [Col6]IS NULL)
        AND ([Col7] = A.[Col7]OR [Col7]IS NULL)
        AND ([Col8] = A.[Col8]OR [Col8]IS NULL)
        AND ([Col9] IS NULL)
        AND ([Col10] IS NULL)
        AND ([Col11] = A.[Col11] OR [Col11] IS NULL)
        AND ([Col12] = A.[Col12]OR [Col12] IS NULL)
        AND ([Col13] = A.[Col13]OR [Col13]IS NULL)
        AND ([Col14] = A.[Col14] OR [Col14] IS NULL)
        AND ([Col15]= A.[Col15]OR [Col15]IS NULL)
        AND ([Col16] = A.[Col16] OR [Col16] IS NULL)
        AND ([Col17]= A.[Col17]OR [Col17]IS NULL)
        AND ([Col18]= A.[Col18]OR [Col18]IS NULL)
        AND ([Col19]= A.[Col19]OR [Col19]IS NULL)
        AND ([Col20] = A.[Col20] OR [Col20]IS NULL)
        ORDER BY [SortCriteria]
    ) B
)
SELECT * FROM [TableAB]

Query Plan

It currently takes ~1 minute to execute this query. Is there any way I can rewrite the query to improve the performance?

Note that it is a data warehouse system, the above query is part of a large query which uses CTE table "TableAB".

Thanks.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
developer
  • 1,401
  • 4
  • 28
  • 73
  • Maybe SQL Server thinks TableB has more rows... cause that's a large number in your execution plan (11 million)? Can you show `DBCC SHOW_STATISTICS`? Run the query with `SET STATISTICS IO ON` and past the plan here: https://www.brentozar.com/pastetheplan/ – S3S Oct 26 '18 at 16:14
  • Sure, I will run it and upload the plan. Thanks. – developer Oct 26 '18 at 16:26
  • 11m is number of rows of TableA * number of rows of TableB (121,903 * 95 = ~11m). As it is using outer apply, it joins each row of Table A with Table B to get the result. – developer Oct 26 '18 at 16:34
  • Query plan: https://www.brentozar.com/pastetheplan/?id=S12sLaenX – developer Oct 26 '18 at 16:54
  • 1
    What is the `SortCriteria`? You say you want to get rid of the sort but haven't told us that – Martin Smith Oct 26 '18 at 16:55
  • Table '#TableB_0000000405DF'. Scan count 121903, logical reads 487612, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table '#TableA_00000004063F'. Scan count 1, logical reads 12321, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. – developer Oct 26 '18 at 16:55
  • @MartinSmith - Sort Criteria determines which row to select if there are multiple matches with TableA. (It is required, I will update the question - sorry.) – developer Oct 26 '18 at 16:57
  • I wonder if using ISNULL might not be faster here. Maybe not, but worth a try. – Robert Sievers Oct 26 '18 at 18:13
  • despite you saying you were going to update the Q I dont see any useful update. I wasnt asking you to explain the concept of using a sort criteria I was asking you exactly what that is in your case. – Martin Smith Oct 27 '18 at 17:08
  • @MartinSmith I have a configuration table (Table B) which has set of configs along with its priority (integer) to determine which config to use if there are multiple matches with Table A. Example: Table A can be an Employee table having a Department ID in it. Table B can be a Course table having list of eligible courses for each department and employee designation it is applicable to and a priority column. Now, if an employee is eligible for multiple course then using priority column we can determine which course the employee should be assigned to. Priority column is the sortorder. – developer Oct 27 '18 at 18:19
  • Still very vague. Ideally you want to get an index ordered in priority order so it doesn't need to scan and sort but can just scan the index in order and stop when it reaches the first matching one but based on the lack of a good description/example code difficult to know if this is possible. – Martin Smith Oct 27 '18 at 18:28
  • yes, you are right. After adding the index (as per Tab Alleman's answer), it improved the performance. Thank you. – developer Oct 29 '18 at 14:21

1 Answers1

1

Since the bulk of the execution is spent sorting TableB, the most likely candidate for improving performance would be to add an index that covers SortCriteria and INCLUDES all the columns in TableB that are selected in the query.

Tab Alleman
  • 31,483
  • 7
  • 36
  • 52
  • Adding non clustered index that covers SortCriteria and INCLUDES all the columns in TableB that are selected in the query improved the performance. Thank you. – developer Oct 29 '18 at 14:23