2

I need to run a query to compare two tables and return the records that are not in the second table. The tricky part is that the link between the two table is conditional. I have multiple sources feeding into Table2 and that is then feeding into Table1. I've tried using a join, but that is not any faster than what I have below. The tables have about 50k records in them. This query takes about 1.5 minutes to complete. I'm hoping to get it to about a few second. Table1 and Table2 already have indexes on these fields. The DB is running on compatibility Level SQL2008.

This query takes about 1.5 minutes:

select *
from Table1 t1 
where not exists (select * 
                  from Table2 t2
                  where t2.seedID = case t2.SeedSource when 'SeedSource1' then t1.SeedSource1
                                                       when 'SeedSource2' then t1.SeedSource2
                                                       when 'SeedSource3' then t1.SeedSource3
                                                       when 'SeedSource4' then t1.SeedSource4 
                                                       when 'SeedSource5' then t1.SeedSource5 end)

This query takes over five minutes:

select d.*
from Tabel1 t1 left join
     Table2 t2 on t2.seedID = case t2.SeedSource when 'SeedSource1' then t1.SeedSource1 
                                                 when 'SeedSource2' then t1.SeedSource2
                                                 when 'SeedSource3' then t1.SeedSource3
                                                 when 'SeedSource4' then t1.SeedSource4
                                                 when 'SeedSource5' then t1.SeedSource5  end
where t2.seedID is NULL

Any help would be greatly appreciated.

Sandy DeLeon
  • 622
  • 6
  • 17

3 Answers3

1

Actually they are not the same query as the left join would return multiple rows on multiple matches

Not exists is the better approach

I hope SeedSource, SeedSource1-5, and seedID are indexed

select *
from Table1 t1 
where not exists ( select * 
                   from Table2 t2
                   where t2.seedID = t1.SeedSource1 
                   and   t2.SeedSource = 'SeedSource1' 
                   union all  
                   select * 
                   from Table2 t2
                   where t2.seedID = t1.SeedSource2 
                   and   t2.SeedSource = 'SeedSource2'
                   //...
                 )

maybe

left join Table2 t2
  on ( t2.seedID = t1.SeedSource1 and t2.SeedSource = 'SeedSource1' )
  or ( t2.seedID = t1.SeedSource2 and t2.SeedSource = 'SeedSource2' )
  // ...
paparazzo
  • 44,497
  • 23
  • 105
  • 176
0

After reading your requirements, your query looks fine. (It is best practice to use Not Exists instead of Not IN and Left Join)

You can do a small optimization :

  • In the WHERE clause use not exists (select 1 ... instead of not exists (select * ... it is not necessary to select all columns when you can select a constant. (better performance)

Reference

Hadi
  • 36,233
  • 13
  • 65
  • 124
  • There is no **runtime** performance difference between `select 1` and `select *` when using `exists()`. Although using `select 1` will avoid having to examine any unneeded metadata for that table during query compilation. [EXISTS Subqueries: SELECT 1 vs. SELECT - Conor Cunningham](http://www.sqlskills.com/blogs/conor/exists-subqueries-select-1-vs-select/) – SqlZim Feb 17 '17 at 21:48
0

Maybe:

SELECT t1.* FROM table1 t1 LEFT JOIN table2 t2 on t1.seedID = t2.seedID 
WHERE t2.seedID is NULL AND t1.SeedSource IN ('SeedSource1','SeedSource2','SeedSource3','SeedSource4','SeedSource5')
Hadi
  • 36,233
  • 13
  • 65
  • 124
webservices
  • 455
  • 4
  • 3
  • This doesn't working since t2.SeedSource will always be one of those values. The main thing is that the value of t2.SeedSource determines what fields from t1 will be used to join the two tables. Also, there is no t1.seedID. – Sandy DeLeon Feb 20 '17 at 15:36