1

HIVE SQL - I have 2 tables. survey and survey_comments and the structure is as shown below:

create external table if not exists survey(
id string,
category_name string, 
subcategory_name string)
STORED AS parquet;

insert into survey(id, category_name, subcategory_name) 
values ('1', 'Engine', 'Engine problem other than listed');
insert into survey(id, category_name, subcategory_name) 
values ('1', 'Exterior Body', 'Color match of painted parts');
insert into survey(id, category_name, subcategory_name) 
values ('1', 'Exterior Body', 'Tail lights');
insert into survey(id, category_name, subcategory_name) 
values ('1', 'Heating/Ventilation and Cooling', 'Front windshield fogs up');
insert into survey(id, category_name, subcategory_name) 
values ('1', 'Transmission', 'Rough shifting');

create external table if not exists survey_comments(
id string,
category_name_txt string, 
subcategory_name_txt string,
comments string)
STORED AS parquet;

insert into survey_comments(id, category_name_txt, subcategory_name_txt) 
values ('1', 'Exterior Body', 'Tail lights', 'Moisture in lower portion of rear tail lights along with leaves etc.');
insert into survey_comments(id, category_name_txt, subcategory_name_txt) 
values ('1', 'Heating/Ventilation and Cooling', 'Front windshield fogs up', 'Small amount of fog low on front windshield during/after rain.');
insert into survey_comments(id, category_name_txt, subcategory_name_txt) 
values ('1', 'Miscellaneous', 'General problem other than listed', 'When filling vehicle with gas; the pumps fill the gas line too quickly, had to hold the pump handle only 1/2 way on.');
insert into survey_comments(id, category_name_txt, subcategory_name_txt) 
values ('1', 'Miscellaneous', 'General problem other than listed', 'Touch-up paint too red, not same red as on the car.');

Now my full outer join is as shown below:

select b.id, b.category_name, b.subcategory_name, a.category_name_txt, a.sub_category_name_txt, a.comments
from   survey b full outer join survey_comments a 
on (
b.id = a.id and              
b.category_name     = a.category_name_txt and                      b.subcategory_name  = a.sub_category_name_txt
)

I am not getting the rows in survey_comment_txt that has category_name as 'Miscellaneous'. I need non matching rows in survey and survey_comments as seperate rows and matching rows as well. What am I doing wrong.

JJJ
  • 32,902
  • 20
  • 89
  • 102
user1509593
  • 1,025
  • 1
  • 10
  • 20
  • Learn what FULL JOIN ON returns: INNER JOIN ON rows UNION ALL unmatched left & right table rows extended by NULLs. Always know what INNER JOIN you want as part of an OUTER JOIN. A WHERE or INNER JOIN ON that requires a right/left/both table column to be not NULL after an OUTER JOIN ON removes any rows from the table(s) extended by NULLs, ie leaves only LEFT/RIGHT/INNER JOIN ON rows, ie "turns OUTER JOIN into INNER JOIN". You have that. – philipxy Jun 15 '19 at 12:06
  • Possible duplicate of [Full outer join not returning all rows?](https://stackoverflow.com/questions/16167870/full-outer-join-not-returning-all-rows) – philipxy Jun 15 '19 at 12:12

1 Answers1

0

Tested using CTEs, have found two issues. 1st: You are inserting four columns in the survey_comments, but named only three. 2nd: in the query column name should be subcategory_name_txt, not sub_category_name_txt.

Test after fixing:

with survey as (
select stack(5,
'1', 'Engine', 'Engine problem other than listed',
'1', 'Exterior Body', 'Color match of painted parts',
'1', 'Exterior Body', 'Tail lights',
'1', 'Heating/Ventilation and Cooling', 'Front windshield fogs up',
'1', 'Transmission', 'Rough shifting') as (id, category_name, subcategory_name) 
),

survey_comments as (
select stack(4,

'1', 'Exterior Body', 'Tail lights', 'Moisture in lower portion of rear tail lights along with leaves etc.',
'1', 'Heating/Ventilation and Cooling', 'Front windshield fogs up', 'Small amount of fog low on front windshield during/after rain.',
'1', 'Miscellaneous', 'General problem other than listed', 'When filling vehicle with gas; the pumps fill the gas line too quickly, had to hold the pump handle only 1/2 way on.',
'1', 'Miscellaneous', 'General problem other than listed', 'Touch-up paint too red, not same red as on the car.') as (id, category_name_txt, subcategory_name_txt,comments)
)

select b.id, b.category_name, b.subcategory_name, a.category_name_txt, a.subcategory_name_txt, a.comments
from   survey b full outer join survey_comments a 
on (
b.id = a.id and              
b.category_name     = a.category_name_txt and b.subcategory_name  = a.subcategory_name_txt
)

Returns:

b.id    b.category_name b.subcategory_name  a.category_name_txt a.subcategory_name_txt  a.comments  
1   Engine  Engine problem other than listed    NULL    NULL    NULL    
1   Exterior Body   Color match of painted parts    NULL    NULL    NULL    
1   Exterior Body   Tail lights Exterior Body   Tail lights Moisture in lower portion of rear tail lights along with leaves etc.    
1   Heating/Ventilation and Cooling Front windshield fogs up    Heating/Ventilation and Cooling Front windshield fogs up    Small amount of fog low on front windshield during/after rain.  
NULL    NULL    NULL    Miscellaneous   General problem other than listed   Touch-up paint too red, not same red as on the car. 
NULL    NULL    NULL    Miscellaneous   General problem other than listed   When filling vehicle with gas; the pumps fill the gas line too quickly, had to hold the pump handle only 1/2 way on.    
1   Transmission    Rough shifting  NULL    NULL    NULL    

Miscellaneous is returned.

leftjoin
  • 36,950
  • 8
  • 57
  • 116