6

I came across this very odd situation, and i thought i would throw it up to the crowd to find out the WHY.

I have a query that was joining a table on a linked server:

select a.*, b.phone
from table_a a, 
join remote.table_b b on b.id = a.id
 (lots of data on A, but very few on B)

this query was talking forever (never even found out the actual run time), and that is when I noticed B had no index, so I added it, but that didn't fix the issue. Finally, out of desperation I tried:

select a.*, b.phone
from table_a a, 
join (select id, phone from remote.B) as b on b.id = a.id

This version of the query, in my mind as least, should have the same results, but lo and behold, its responding immediately!

Any ideas why one would hang and the other process quickly? And yes, I did wait to make sure the index had been built before running both.

Yuck
  • 49,664
  • 13
  • 105
  • 135
Limey
  • 2,642
  • 6
  • 37
  • 62

4 Answers4

4

It's because sometimes(very often) execution plans automatically generated by sql server engine are not as good and obvious as we want to. You can look at execution plan in both situations. I suggest use hint in first query, something like that: INNER MERGE JOIN.

Here is some more information about that:

http://msdn.microsoft.com/en-us/library/ms181714.aspx

devarc
  • 1,157
  • 1
  • 7
  • 11
  • This is a nice new thing I have learned today! I know some other places I can use this kind of merge. – Limey Jan 25 '12 at 20:57
  • I'm using merge hint to join view (created from 3 tables) and table. View already has executuion plan but even that engine creates new, much much slower. That is another case when you can find it useful. – devarc Jan 25 '12 at 21:11
  • 3
    `MERGE` isn't a magic "go faster" hint. If you aren't getting the right plan it is better to first understand why. – Martin Smith Jan 25 '12 at 21:30
  • Of course. I agree with that. I looked at execution plan and I found that he was joining first one big table from view with table not from view and then rest which is completly odd. – devarc Jan 25 '12 at 21:45
  • I agree, I don't think you can use this everywhere either, but in my situation I knew that this would solve the problem as soon as i read what it would do (there were a lot of other tables I was not showing in examples that could change the execution plan greatly. – Limey Jan 25 '12 at 21:57
3

For linked servers 2nd variant prefetches all the data locally and do the join, since 1st variant may do inner loop join roundtrip to linked server for every row in A

Oleg Dok
  • 21,109
  • 4
  • 45
  • 54
1

Remote table as in not on that server? Is it possible that the join is actually making multiple calls out to the remote table while the subquery is making a single request for a copy of the table data, thus resulting in less time waiting on network?

Dan Roberts
  • 4,664
  • 3
  • 34
  • 43
  • `~600` rows in the remote table. Find something to support the idea that the subquery is resulting in the entire table being cached and this gets my +1 – Yuck Jan 25 '12 at 20:25
  • The estimated execution plan shows it only accessing it once in both versions of the query. The bad one is estimated using 40% of the processing, while the 2nd one is 1% – Limey Jan 25 '12 at 20:27
  • Limey, what is the exact wording in the execution plan? I would be surprised if it actually indicates the number of calls that were made internal to the process. – Dan Roberts Jan 25 '12 at 20:36
  • when using a join, its estimate number of executions is 1 – Limey Jan 25 '12 at 20:41
  • @Limey - What is the join type? Merge? Hash? Nested Loops? If you do a Diff of the XML plans what is different between the two? – Martin Smith Jan 25 '12 at 20:47
  • The join type was inner for both queries. I have tried merge now, and that does resolve the issue. – Limey Jan 25 '12 at 20:55
1

I'm just going to have a guess here. When you access remote.b is it a table on another server?

If it is, the reason the second query is faster is because, you do one query to the other server and get all the fields you need from b, before processing the data. In the first query you are processing data and at the same time you are making several requests to the other server.

Hope this help you.

Adrian Matteo
  • 987
  • 1
  • 10
  • 28