How to combine data from 2 tables -- which join, what conditions?

Question

Consider the following 2 tables.

TableDE

ID  country  key1   key2
------------------------
1   US       1      null
1   US       1      null
1   US       1      null
2   US       null   null
3   US       1      1
4   DE       1      1
5   DE       null   null
5   DE       null   null

TableUS

ID  key1  key2
--------------
1   null  null
2   null  1
4   1     1
8   null  1
2   null  1
2   null  1
9   1     null

I need a distinct overview of all IDs, combining data from both tables:

ID  inTableDe  country  DEkey1  DEkey2  inTableUS  USkey1  USKey2
-----------------------------------------------------------------
1   1          US       1       0       1          0       0
2   1          US       0       0       1          0       1
3   1          US       1       1       0          0       0
4   1          DE       1       1       1          1       1
5   1          DE       0       0       0          0       0
8   0          0        0       1       1          0       1
9   0          0        0       1       1          1       0

I hope it speaks for itself:

ID 8 and ID 9 have 0 in the first column bc they aren't in tableDE
ID 8 and ID 9 have 0 in the country column bc this field doesn't exist in tableUS
ID 3 has 0 in inTableUS bc it only exists in tableDE
the key values are copied from the original tables
an ID is not unique: it can appear many times in both tables. However: the values for key1 and key2 will always be the same for each ID within the same table.

I have been messing for hours now with this; I have this now:

select          de.[ID], 
                de.[country],
                case when (de.[ID] in (select distinct [ID] from [tableDE]) then 1 else 0 end as [inTableDE],
                case when (de.[ID] in (select distinct [ID] from [tableUS]) then 1 else 0 end as [inTableUS],
                de.[key1] as [DEKey1],
                de.[key2] as [DEKey2],
                us.[key1] as [USKey1],
                us.[key2] as [USKey2],
from            dbo.[tableDE] de
full outer join dbo.[tableUS] us on de.[ID] = us.[ID]
where           de.[country] = 'US'
and             (de.[key1] = 1 or de.[key2] = 1 or us.[key1] = 1 or us.[key2] = 1)
group by        de.[ID], us.[ID]

But this keeps giving me only values that are in both tables.

What am I doing wrong?

By steps (easy with no aggregation): 1) create two new tables with select distinct (to avoid the problem of duplicated rows) 2) do the outer join between the new tables 3) add rows with id not present throught a loop in range (0:max_id) Sounds good? — Trics, Dec 17 '20 at 11:57
The `us.[key1]` and `us.[key2]` conditions from your `where` clause are turning your `outer join` to into an _implicite_ `inner join`. In order to apply the `where` clause the SQL engine _requires_ the `us` row to exist. Moving the `where` clause filter to the `join on` predicate is usally the solution. — Sander, Dec 17 '20 at 12:03

score 3 · Accepted Answer · answered Dec 17 '20 at 12:00

You sem to want aggregation on top of the full join:

select          
    coalesce(de.id, us.id)                    as id,
    case when de.id is null then 0 else 1 end as intablede,
    max(de.country)                           as country,
    coalesce(max(de.key1), 0)                 as dekey1,
    coalesce(max(de.key2), 0)                 as dekey2,
    case when us.id is null then 0 else 1 end as intableus,
    coalesce(max(us.key1), 0)                 as uskey1,
    coalesce(max(us.key2), 0)                 as uskey2
from dbo.tablede de
full join dbo.tableus us on de.id = us.id
group by de.id, us.id
order by id

Demo on DB Fiddle:

id | intablede | country | dekey1 | dekey2 | intableus | uskey1 | uskey2
-: | --------: | :------ | -----: | -----: | --------: | -----: | -----:
 1 |         1 | US      |      1 |      0 |         1 |      0 |      0
 2 |         1 | US      |      0 |      0 |         1 |      0 |      1
 3 |         1 | US      |      1 |      1 |         0 |      0 |      0
 4 |         1 | DE      |      1 |      1 |         1 |      1 |      1
 5 |         1 | DE      |      0 |      0 |         0 |      0 |      0
 8 |         0 | null    |      0 |      0 |         1 |      0 |      1
 9 |         0 | null    |      0 |      0 |         1 |      1 |      0

Many thanks for this! I do get an error though: `Operand data type bit is invalid for max operator`. Seems that I have the wrong datatype for the `key` columns. Is there a way around this? — Pr0no, Dec 17 '20 at 12:06
@Pr0no: if you have `bit`s, you need to convert them to integers, as in: `convert(int, de.key1)` instead of `de.key1`. The same logic needs to be applied on all such columns in the query. — GMB, Dec 17 '20 at 12:12

How to combine data from 2 tables -- which join, what conditions?

1 Answers1