Full Join and get left table columns always populated even if they are null

Question

I have these two tables in bigquery:

Lets say table people

Name	Country	Continent
Adam	null	55
Adam	1	55
Josh	4	55
Josh	null	55
Josh	3	55

And table country:

Country	Continent
1	55
2	55
3	55
4	55
1	44
2	44
3	44
4	44
5	44
1	22
2	22
3	22

I would like to obtain as result a query that gives me for every person one row per country where the country and continent matches. If the country is null there should be a row as well.

I understand I wanna do a full join but somehow I am complicating myself. The result I would like would be:

Name	Country	Continent
Adam	null	55
Adam	1	55
Adam	2	55
Adam	3	55
Adam	4	55
Josh	null	55
Josh	1	55
Josh	2	55
Josh	3	55
Josh	4	55

I know somehow I want to use full join, but this gives me continents that i am not interested in and also the name is not always populated

select 
    a.name
  , coalesce(a.country, b.country) as country
  , coalesce(a.continent, b.continent) as continent
from 
   people a FULL JOIN 
     country b 
      on a.country = b.country
        and a.continent = b.continent

xQbert · Accepted Answer · 2023-03-15T12:54:36.913

First we have to recognize what you're after is all countries for those names which have a continent and country defined. But when a country is NULL for a name, you do not want all countries. These are two distinct rules governing the desired output. This is why it's not 1 simple join type because you have two distinct rules.

So if we can assume you want: For each Name, all the countries associated to the continent and include those where a country isn't defined but not repeated for each country.

I do not see a way to do this with just one join type.

In this example, an incomplete join on people to country is being used. The join includes only the key condition continent partial join.

ResultSet as (
SELECT P.Name, C.Country, C.Continent 
FROM PEOPLE P
FULL OUTER JOIN COUNTRY  C
 on P.Continent = C.Continent
WHERE P.country is not null

UNION 

SELECT P.Name, P.Country, P.Continent 
FROM PEOPLE P
WHERE P.country is null)

SELECT *
FROM ResultSet
Order by Name, Country, Continent

Giving us:

+------+---------+-----------+
| Name | Country | Continent |
+------+---------+-----------+
| Adam | null    |        55 |
| Adam | 1       |        55 |
| Adam | 2       |        55 |
| Adam | 3       |        55 |
| Adam | 4       |        55 |
| Josh | null    |        55 |
| Josh | 1       |        55 |
| Josh | 2       |        55 |
| Josh | 3       |        55 |
| Josh | 4       |        55 |
+------+---------+-----------+

But that's not optimal. We are generating records we end up excluding in the union though its embedded use of distinct

This second approach is a little cleaner. Here we use a subquery to get a distinct list of names per continent then cross join to get the countries based on like continents. Then we union back in the null records since we don't want multiple null values for each country.

ResultSet as (
SELECT P.Name, C.Country, C.Continent 
FROM (Select distinct P.Name, p.Continent FROM PEOPLE P where P.Country is not null) P
CROSS JOIN COUNTRY  C
WHERE P.Continent = C.Continent
    
UNION 
SELECT P.Name, P.Country, P.Continent 
FROM PEOPLE P
WHERE P.country is null)
SELECT *
FROM ResultSet
Order by Name, Country, Continent

My general rule of thumb is: if a record doesn't exist in a data table and we need to "Build it" a cross join should be used. It becomes apparent to future maintenance we had to construct "Missing" data. Partial/incomplete joins hide this and may make people think you missed a join.

Full Join and get left table columns always populated even if they are null

1 Answers1