1

We have query which is slow in production(for some internal reason),

SELECT T2.Date
FROM Table1 T1
OUTER APPLY 
(
    SELECT TOP 1 T2.[DATE] 
    FROM Table2 T2 
    WHERE T1.Col1 = T2.Col1 
    AND T1.Col2 = T2.Col2 
    ORDER BY T2.[Date] DESC
) T2

But when I convert to LEFT JOIN it become fast,

SELECT Max(T2.[Date])
FROM Table1 T1
LEFT JOIN Table2 T2
   ON T1.Col1 = T2.Col1 
   AND T1.Col2 = T2.Col2
GROUP BY T1.Col1, T1.Col2

Can we say that both queries are equal? If not then how to convert it properly.

JohnLBevan
  • 22,735
  • 13
  • 96
  • 178
Imran Qadir Baksh - Baloch
  • 32,612
  • 68
  • 179
  • 322
  • In addition, the logic suggested seems to rely on some assumptions. Using an aggregate should reduce the number of rows returned assuming the logic is analogous. Is it just luck based on the available data that you seem to think these are the same? Did you try this and compare the results? – SMor Oct 17 '17 at 14:59
  • In your first query, the resultset will contain one row for every row in T1. In your second query, your aggregate will reduce the resultset if there are duplicated values of Col1 and Col2 in T1. So - logically they are not the same without assumptions (or DDL that demonstrates that this cannot occur). – SMor Oct 17 '17 at 15:02
  • They aren't fully equal. In case you have duplicates in Table1, the first query will return more records – Denis Rubashkin Oct 17 '17 at 15:03
  • 1
    And lastly - this query seems vastly more complicated than needed. So the better path is to define your goal rather than ask others to review your current code without understanding your schema and your purpose. – SMor Oct 17 '17 at 15:04
  • @DenisRubashkin thanks. In my case I am sure that T1.Col1, T1.Col2 are unique. – Imran Qadir Baksh - Baloch Oct 17 '17 at 15:05
  • NB: This has a code smell; i.e. grouping by `T1.Col1` & `T1.Col2` (/ returning one date value per Table1 row) without returning those columns. i.e. It may be OK depending on what you plan to do with those dates, but seems odd to return a list of dates without returning something to give those rows any context. – JohnLBevan Oct 17 '17 at 15:16

1 Answers1

1

The queries are not exactly the same. It is important to understand the differences.

If t1.col1/t1.col2 are duplicated, then the first query returns a separate row for each duplication. The second combines them into a single row.

If either t1.col1 or t1.col2 are NULL, then the first query will return NULL for the maximum date. The second will return a row and the appropriate maximum.

That said, the two queries should have similar performance, particularly if you have an index on table2(col1, col2, date). I should note that under some circumstances the apply method is faster than joins, so relative performance depends on circumstances.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786