6

I'm trying to see if my understanding of JOINs is correct.

For the following query:

SELECT * FROM tableA 
   join tableB on tableA.someId = tableB.someId
   join tableC on tableA.someId = tableC.someId;

Does the RDMS basically execute similar pseudocode as follows:

List tempResults
for each A_record in tableA
    for each B_record in tableB
        if (A_record.someId = B_record.someId)
            tempResults.add(A_record)

List results 
for each Temp_Record in tempResults
    for each C_record in tableC
        if (Temp_record.someId = C_record.someId)
            results.add(C_record)

return results;

So basically the more records with the same someId tableA has with tableB and tableC, the more records the RDMS have the scan? If all 3 tables have records with same someId, then essentially a full table scan is done on all 3 tables?

Is my understanding correct?

Glide
  • 20,235
  • 26
  • 86
  • 135
  • Nice question. But I am not sure if anyone can answer this. `;)` Definitely I feel the same thing is happening in the behind. – Praveen Kumar Purushothaman Dec 27 '16 at 02:39
  • Wanting to understand T-SQL join statements more? This [joins](https://mva.microsoft.com/en-US/training-courses/querying-with-transactsql-10530?l=BooD4g87_7904984382) is a good piece to understanding them. – Edward Dec 27 '16 at 02:44
  • [**This post**](http://stackoverflow.com/questions/29795400/loop-join-in-sql-server-2008) ii useful. – Vikrant Dec 27 '16 at 02:59
  • I'm not sure what you're asking, is the vendor implementation of `Joining` what concerns you, or the underlying concept from the Relational model? – Jerry Chin Dec 27 '16 at 03:22
  • @JerryChin I'm not interested in vendor implementation - just in general, what's happening inside a RDMS performing `JOIN`s. – Glide Dec 27 '16 at 03:40

1 Answers1

4

Each vendor's query processor is of course written (coded) slightly differently, but they probably share many common techniques. Implementing a join can be done in a variety of ways, and which one is chosen, in any vendor's implementation, will be dependent on the specific situation, but factors that will be considered include whether the data is already sorted by the join attribute, the relative number of records in each table (a join between 20 records in one set of data with a million records in the other will be done differently than one where each set of records is of comparable size). I do not know the internals for MySQL, but for SQL server, there are three different join techniques, a Merge Join, a Loop Join, and a Hash Join. Take a look at this.

Charles Bretana
  • 143,358
  • 22
  • 150
  • 216
  • So is the pseudocode I provided is an example of a ***Loop Join***? – Glide Dec 27 '16 at 06:21
  • Yes, what you did is a loop join. This approach is appropriate if a small set is to be joined with a very large one. You iterate across the small one of course, and for each record there use an index to find the matches in the big one. – Charles Bretana Dec 27 '16 at 13:18