What is the difference between semi-joins and a subquery? I am currently taking a course on this on DataCamp and i'm having a hard time making a distinction between the two.
Thanks in advance.
What is the difference between semi-joins and a subquery? I am currently taking a course on this on DataCamp and i'm having a hard time making a distinction between the two.
Thanks in advance.
A join or a semi join is required whenever you want to combine two or more entities records based on some common conditional attributes.
Unlike, Subquery is required whenever you want to have a lookup or a reference on same table or other tables
In short, when your requirement is to get additional reference columns added to existing tables attributes then go for join else when you want to have a lookup on records from the same table or other tables but keeping the same existing columns as o/p go for subquery
Also, In case of semi join it can act/used as a subquery because most of the times we dont actually join the right table instead we mantain a check via subquery to limit records in the existing hence semijoin but just that it isnt a subquery by itself
I don't really think of a subquery and a semi-join as anything similar. A subquery is nothing more interesting than a query that is used inside another query:
select * -- this is often called the "outer" query
from (
select columnA -- this is the subquery inside the parentheses
from mytable
where columnB = 'Y'
)
A semi-join is a concept based on join. Of course, joining tables will combine both tables and return the combined rows based on the join criteria. From there you select the columns you want from either table based on further where criteria (and of course whatever else you want to do). The concept of a semi-join is when you want to return rows from the first table only, but you need the 2nd table to decide which rows to return. Example: you want to return the people in a class:
select p.FirstName, p.LastName, p.DOB
from people p
inner join classes c on c.pID = p.pID
where c.ClassName = 'SQL 101'
group by p.pID
This accomplishes the concept of a semi-join. We are only returning columns from the first table (people). The use of the group by is necessary for the concept of a semi-join because a true join can return duplicate rows from the first table (depending on the join criteria). The above example is not often referred to as a semi-join, and is not the most typical way to accomplish it. The following query is a more common method of accomplishing a semi-join:
select FirstName, LastName, DOB
from people
where pID in (select pID
from class
where ClassName = 'SQL 101'
)
There is no formal join here. But we're using the 2nd table to determine which rows from the first table to return. It's a lot like saying if we did join the 2nd table to the first table, what rows from the first table would match? For performance, exists is typically preferred:
select FirstName, LastName, DOB
from people p
where exists (select pID
from class c
where c.pID = p.pID
and c.ClassName = 'SQL 101'
)
In my opinion, this is the most direct way to understand the semi-join. There is still no formal join, but you can see the idea of a join hinted at by the usage of directly matching the first table's pID column to the 2nd table's pID column. Final note. The last 2 queries above each use a subquery to accomplish the concept of a semi-join.