-1

I am doing full outer join on 4 tables on the same column. I want to generate only 1 row for each different value in the Join column.

Inputs are:

employee1
+---------------------+-----------------+--+
| employee1.personid  | employee1.name  |
+---------------------+-----------------+--+
| 111                 | aaa             |
| 222                 | bbb             |   
| 333                 | ccc             | 
+---------------------+-----------------+--+
employee2
+---------------------+----------------+--+
| employee2.personid  | employee2.sal  |
+---------------------+----------------+--+
| 111                 | 2              |
| 200                 | 3              |
+---------------------+----------------+--+
employee3
+---------------------+------------------+--+
| employee3.personid  | employee3.place  |
+---------------------+------------------+--+
| 111                 | bbsr             |
| 300                 | atl              |
| 200                 | ny               |
+---------------------+------------------+--+
employee4
+---------------------+---------------+--+
| employee4.personid  | employee4.dt  |
+---------------------+---------------+--+
| 111                 | 2019-02-21    |
| 300                 | 2019-03-18    |
| 400                 | 2019-03-18    |
+---------------------+---------------+--+

Expected Result one record for each personid, so total there should be 6 records(111,222,333,200,300,400) Like:

+-----------+---------+--------+----------+-------------+--+
| personid  | f.name  | u.sal  | v.place  |   v_in.dt   |
+-----------+---------+--------+----------+-------------+--+
| 111       | aaa     | 2      | bbsr     | 2019-02-21  |
| 200       | NULL    | 3      | ny       | NULL        |
| 222       | bbb     | NULL   | NULL     | NULL        |
| 300       | NULL    | NULL   | atl      | 2019-03-18  |
| 333       | ccc     | NULL   | NULL     | NULL        |
| 400       | NULL    | NULL   | NULL     | 2019-03-18  |
+-----------+---------+--------+----------+-------------+--+

Result i am getting is:

+-----------+---------+--------+----------+-------------+--+
| personid  | f.name  | u.sal  | v.place  |   v_in.dt   |
+-----------+---------+--------+----------+-------------+--+
| 111       | aaa     | 2      | bbsr     | 2019-02-21  |
| 200       | NULL    | 3      | NULL     | NULL        |
| 200       | NULL    | NULL   | ny       | NULL        |
| 222       | bbb     | NULL   | NULL     | NULL        |
| 300       | NULL    | NULL   | atl      | NULL        |
| 300       | NULL    | NULL   | NULL     | 2019-03-18  |
| 333       | ccc     | NULL   | NULL     | NULL        |
| 400       | NULL    | NULL   | NULL     | 2019-03-18  |
+-----------+---------+--------+----------+-------------+--+

Query used:

select coalesce(f.personid, u.personid, v.personid, v_in.personid) as personid,f.name,u.sal,v.place,v_in.dt
from employee1 f FULL OUTER JOIN employee2 u on f.personid=u.personid
FULL OUTER JOIN employee3 v on f.personid=v.personid
FULL OUTER JOIN employee4 v_in on f.personid=v_in.personid;

Please suggest how to generate the expected result.

philipxy
  • 14,867
  • 6
  • 39
  • 83
user1326784
  • 627
  • 3
  • 11
  • 31
  • Learn what full join on returns: inner join on rows plus unmatched left & right table rows extended by nulls. Always know what inner join you want as part of an outer join. PS Please in code questions give a [mcve]--cut & paste & runnable code plus desired output plus clear specification & explanation. Minimal means adding minimal problem code to minimal working code. So give minimal code that you show does what you expect & minimal code with the first place you go wrong. (Debugging fundamental.) – philipxy Mar 18 '19 at 19:23
  • 1
    Please do not repost questions. Edit them. If you want even more attention then place a bounty. This is a duplicate of https://stackoverflow.com/q/55190125/3404097 with some names changed & no group by. Clarify via edits, not comments. (Or new posts.) It's good that you dropped the group by so that you simplified your problem but you should have done that by editing your original question post since it did not invalidate any answer posts. – philipxy Mar 18 '19 at 23:37
  • Possible duplicate of [Hive Full Outer Join with 4 Tables on same Key, different schema](https://stackoverflow.com/questions/55190125/hive-full-outer-join-with-4-tables-on-same-key-different-schema) – philipxy Mar 18 '19 at 23:38

2 Answers2

0

full outer join is tricky, because you have to take previous NULLs into account. But you can do:

select coalesce(f.personid, u.personid, v.personid, v_in.personid) as personid,f.name,u.sal,v.place,v_in.dt
from employee1 f FULL OUTER JOIN
     employee2 u
     on f.personid = u.personid FULL OUTER JOIN
     employee3 v
     on v.personid in (f.person_id, u.person_id) FULL OUTER JOIN
     employee4 v_in
     on v_in.personid in (f.person_id, u.person_id, v.person_id);

In databases that support using for joins (instead of on) this is simpler. I don't think that Hive supports using, though.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I already trying ths solution, it cant run... It rise error : both left and right aliases encountered in join 'ID' what went wrong? – thecardcaptor Jan 13 '21 at 15:51
0

FULL JOIN returns all rows joined + all not joined from the left table + all not joined from the right table. And since you are joining employee2, employee3, employee4 to the same employee1 table which does not contain personid=200, all not joined rows returned from all four tables.

I'd suggest to UNION ALL all four tables providing NULLs for missing fields + aggregate group by personid:

select personid, max(name) name, max(sal) sal, max(place) place, max(dt) dt 
from 
(
select  personid, name, NULL sal, NULL place, NULL dt from employee1  e1
UNION ALL
select  personid, NULL name, sal, NULL place, NULL dt from employee2  e2
UNION ALL
select  personid, NULL name, NULL sal, place, NULL dt from employee3  e3
UNION ALL
select  personid, NULL name, NULL sal, NULL place, dt from employee4  e4
)s
group by personid;

This will perform better than joins.

leftjoin
  • 36,950
  • 8
  • 57
  • 116