You ask a very basic question and you got downvoted for it. When looking at the answers, they are good and factual correct. But I think the answers are unnecessary complex, because such a basic question needs a answer on very basic level. Otherwise the answer won't help you very much, when you face a similar problem.
So let's start on how to tackle your problem on a basic level.
Note: You can find all the code in a live fiddle.
A basic approach
For a very basic approach, we will first solve the following, easy sub-problems:
- Which fathers have two or more sons
- Which fathers have a son named Bill
- Which fathers have a son named Suzy
We will then combine these partial results using set theory: We look for all fathers that are in the first and second group but not in the third. So we are looking for the following set:
(Problem 1) AND (Problem 2) MINUS (Problem 3).
The nice thing about this approach is, that we get a query for each sub-problem that can be run on itself. So it is very easy to follow the approach with pen and paper and to debug the complete solution. Still we get a very efficient query, that can be nicely optimized by the database engine.
Fathers with two or more sons
Here we need only to access the sons. We take all the sons, group the ones with the same father together and then count, how many are in each group.
SELECT s.fatherid FROM
Sons s
GROUP BY s.fatherid
HAVING COUNT(s.fatherid) >= 2;
You can run this query and easily check the result. In this case you will get the fathers with id 1,2 and 3.
Fathers with son Bill
The second query is very straight forward. We take all the sons, filter them by name and then get their father:
SELECT s.fatherid FROM
Sons s
WHERE
s.name = 'bill';
You will get the fathers with the id 1,3 and 4.
Fathers with son Suzi
This is similar to the previous approach. You should not have any trouble with this query>
SELECT s.fatherid FROM
Sons s
WHERE
s.name = 'bill';
You will get the fathers with the id 1 and 2.
Combining the queries
Now let's combine the previous queries. For this you should know the SQL operators INTERSECT and EXCEPT (sometimes called MINUS). With these operators, you can combine the queries we built before to one query:
SELECT s.fatherid FROM
Sons s
GROUP BY s.fatherid
HAVING COUNT(s.fatherid) >= 2
INTERSECT
SELECT s.fatherid FROM
Sons s
WHERE
s.name = 'bill'
EXCEPT
SELECT s.fatherid FROM
Sons s
WHERE
s.name = 'suzy'
;
So you get the following result:
{1,2,3} INTERSECT {1,3,4} MINUS {1,2} = {3}
Hint: In the fiddle you can mark the part of the query you want to execute and press "Run SQL". This will just run the partial query you have marked and return the according partial result.
Attention: Things get a little bit dirty, when you don't have real sets. This is the case, when one of the sub-queries can return the same father-id twice (or even more often). In this case you could write SELECT DISTINCT
to clean up the intermediate result. If you know for sure, that each fatherid is there at most once, you could write DISTINCT ALL
or EXCEPT ALL
as an optimization. But with the code above, you don't have to worry. EXCEPT
and DISTINCT
(without the ALL
) will handle duplicates just fine.
Hint: Set theory is very good for parallelization. Writing the query this way, your database can optimize the query by running the sub-queries in parallel or by using parallelized vector operations (e.g together with so called bitmap indexes).
A more advanced approach
If you have a situation where you cannot rely on simple set theory, you can use a more advanced approach using JOINS. Joins are a usually expensive. Hence such a approach needs some care regarding performance. Additionally joins interweave the partial queries in a way, so they can't be executed independently any more.
Semi-Join
To find all fathers having a son called Bill, you should use a so called semi-join (WHERE EXISTS
). The semi join tells the database engine, that you are not interested in any particular properties of the son. You only want to know if the on exists or not. This usually allows the database engine to use some special optimizations, e.g. only using indexes and bypass the underlying table completely:
SELECT f.id FROM
Fathers f
WHERE
EXISTS (
SELECT *
FROM Sons s
WHERE
f.id = s.fatherid
AND s.name = 'bill'
)
;
Attention: An anti join is sometimes written as IN
. Some database engines have problems, when the IN
returns more than a few thousand results. Additionally the IN
does not always unlock the optimizations reserved for a semi join.
Anti-Join
For the son called Suzy you can use a so called anti-join (WHERE NOT EXISTS
). This works similar to the semi join, but inverses the result set. Again, you tell the database engine that you are only interested in the existence of certain rows of a table, unlocking some very special optimizations:
SELECT f.id FROM
Fathers f
WHERE
NOT EXISTS (
SELECT *
FROM Sons s
WHERE
f.id = s.fatherid
AND s.name = 'suzy'
)
;
Attention: You cant't replace an anti-join with a semi join when you write s.name <> 'suzy'
. This would return all fathers that have any son whose name is not Suzy (e.g. with a son called Bill) and not the fathers who don't have any son called Suzy.
Attention: An anti join is sometimes written as NOT IN
. Some database engines have problems, when the NOT IN
returns more than a few thousand results. Additionally the NOT IN
does not always unlock the optimizations reserved for an anti join.
Combining into one query
Together with the query that returns all fathers with two or more sons, you can combine the semi join and the anti join into one query:
SELECT f.id FROM
Fathers f JOIN Sons s ON (f.id = s.fatherid)
WHERE
EXISTS (
SELECT *
FROM Sons s2
WHERE
f.id = s2.fatherid
AND s2.name = 'bill'
)
AND
NOT EXISTS (
SELECT *
FROM Sons s3
WHERE
f.id = s3.fatherid
AND s3.name = 'suzy'
)
GROUP BY f.id
HAVING COUNT(s.id) >= 2
;
Drawbacks: In this query you cannot run the subqueries independent of each other. Additionally joins are usually quite expensive. So you the query might run much slower than the pure set oriented approach.