0

I'm running a basic query on a table with a unique ID in each table, which is a non-clustered index in each. One table has 8 million rows, and the other has 800,000 rows.

When I run the following, it returns 24,000 rows in less than a second:

select     
    a.[ID]      
    ,b.[ID]    
from     
    dbo.tbl_1 a    
join     
    dbo.tbl_2 b    
on 
    a.unique_id = b.unique_id 

However, when I add an additional column in the join which will significantly reduce the recordset, this takes about 8 minutes

select     
    a.[ID]      
    ,b.[ID]    
from     
    dbo.tbl_1 a    
join     
    dbo.tbl_2 b    
on 
    a.unique_id = b.unique_id 
    AND a.code_letter = b.code_letter

The "code_letter" column is just one letter, and is set to varchar(1). I'm scratching my head trying to understand why this is hanging. The issue is that I've run a dynamic sql insert query with 20,000 permutations of joins, and it's taking way too long.

Edit After trying this many ways, I realized that a simple select * seems to work very efficiently, while selecting specific columns is the culprit. Here is the execution plan when I select *:

select all query

Here is the execution plan when I select specific columns:

select specific columns

Again, my join is exactly the same, but the column selections are different.

sqlnewbie1979
  • 99
  • 1
  • 11
  • Have you tried to create indexes on code_letter columns? – ivangreek May 23 '20 at 17:52
  • I have, and that did not help for some reason.....but even if it did, it would be an annoying situation where I'd have to create an index for each iteration and drop it which could also be time consuming. – sqlnewbie1979 May 23 '20 at 18:02
  • I'm wondering if it has something to do with cache-ing of the execution plan or something like that. It seems set off by the dynamic SQL that keeps runnign insert queries over and over with different columns in the join? – sqlnewbie1979 May 23 '20 at 18:11
  • For performance questions, please show query plan from `EXPLAIN`. Is the 8 minutes the result of that single query by itself or your *dynamic sql insert query* process? if latter, show fuller TSQL code. – Parfait May 23 '20 at 18:55
  • It's from the single query. This is in t-sql so I can't use explain. I bring up the dynamic part in case there's some reason to believe that would cause an unexpected issues. – sqlnewbie1979 May 23 '20 at 20:09
  • I created an exact index on the columns in the join and that didn't help. Also, the execution plan was the same. – sqlnewbie1979 May 23 '20 at 20:16

1 Answers1

1

OP said, he didn't get the expected results, and based on his observations, I'll provide a different solution.

What I would do is, execute the following and get data to a temp table

select     
    a.[ID] as aID      
    ,b.[ID] as bID
    ,a.code_letter as aCode_letter  
    ,b.code_letter as bCode_letter
into #t
from     
    dbo.tbl_1 a    
join     
    dbo.tbl_2 b    
on 
    a.unique_id = b.unique_id 

and then exec the following

Select aID, bID from #t Where aCode_letter = bCode_letter

select DISTINCT a.*    
into #ta
from     
    dbo.tbl_1 a    
join     
    dbo.tbl_2 b    
on 
    a.unique_id = b.unique_id 

select DISTINCT b.*    
into #tb
from     
    dbo.tbl_1 a    
join     
    dbo.tbl_2 b    
on 
    a.unique_id = b.unique_id 

And exec

Select a.ID, b.ID 
from #ta a
Inner Join #tb b 
on a.unique_id = b.unique_id and a.Code_letter = b.Code_letter
sqlnewbie1979
  • 99
  • 1
  • 11
Srinika Pinnaduwage
  • 1,044
  • 1
  • 7
  • 14
  • I tried that and it was unfortunately time consuming. Here's the latest I've learned: When I run: ```Select * ``` that seems to return results in seconds, whereas when I choose specific columns it spins for minutes. I've never seen that before, and it's rather unexpected behavior. – sqlnewbie1979 May 23 '20 at 21:40
  • Then why do't you do it like that. ie. Select * and then do the 2nd part separately? Or get a.* separately and b.* in to another temp tbl and then join those 2 – Srinika Pinnaduwage May 23 '20 at 21:46
  • The issue there is just I'd have to do a lot of renaming of columns. I am adding the execution plan differences to the original issue if that helps to determine why it's executing so differently. – sqlnewbie1979 May 23 '20 at 21:58
  • 1
    please try my new suggestion – Srinika Pinnaduwage May 23 '20 at 21:59
  • That does work, although it seems like a workaround that should be avoidable. Thank you! – sqlnewbie1979 May 23 '20 at 22:08
  • I have only a limited info. If it is my DB, I would have tried different options. Unfortunately I have to base only on what you have provided here. – Srinika Pinnaduwage May 23 '20 at 22:44
  • I added "DISTINCT" to the select since otherwise you can run into a cartesian join situation. – sqlnewbie1979 May 24 '20 at 21:24
  • @sqlnewbie1979, As I told you, I do not have a way to experiment and test. i was under the assumption that the data you will get are unique from each table, as you were taking all records. If it is not the case, yes, better to have distinct. How is the performance after adding DISTINCT? – Srinika Pinnaduwage May 24 '20 at 21:54
  • Yes, much better and your solution was extremely helpful. – sqlnewbie1979 May 25 '20 at 22:42
  • For future reference, what else would be helpful to provide? – sqlnewbie1979 May 25 '20 at 22:43
  • For this type of a situation it is hard to provide much. If it is something where i can experiment for myself, i would have tried other options as to why all columns are faster than few columns, why need to have distinct and so on. Its hard to tell how i would do. – Srinika Pinnaduwage May 26 '20 at 01:23