Hive : inner join to update "table a" from "table b" where values in "table b" is different from "table a" ? create a last view (a+b)

Question

I have two tables (table a and table b) in Hive.

The table a is an init table (full). The table b is a delta table with the latest updates (this table has the same columns and 5 more columns including a timestamp column for the last updates).

I want to create a last view (join between full + delta) from the two tables to have the last updates.

1- The first step I did is to select the rows with the max(timestamp) from table b (delta) to have last updates

2- Then I used it like a subquery in an inner join :

SELECT full.*, delta.* 
FROM table a FULL 
INNER JOIN (SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY timestamp 
            DESC) as rn
            FROM b) delta 
ON f.id = d.id and d.rn = 1

The problem is that I have to specify the columns I want to use in the select and I will have duplicated columns with different values when table b is updated.

I need a condition to always select columns from table a and replace columns in table a with columns in table b when values in table b is different from values in table a.

Any solutions please ?

Make it easy to assist you, and you will get better answers: [mcve]. — jarlh, Jan 20 '20 at 11:15
Please provide sample data and desired results. It is unclear what you are trying to accomplish. — Gordon Linoff, Jan 20 '20 at 12:31
@LyashkoKirill The table "a" and table "b" has the same columns except that the table b has 4 more columns for the sequence_id, update_date, etc When I use an inner join I have to specify which columns I want in the SELECT clause and since the columns are the same they will be duplicated with different values or same if there is no update — samsunglolipop, Jan 20 '20 at 15:58

score 0 · Accepted Answer · answered Jan 21 '20 at 10:28

Seems like you are looking for COALESCE or NVL functions. In that case your query may look like:

SELECT
    A.Person_id as Person_id,               -- Person_id is taken from A table
    A.Training_id as Training_id,           -- Training_id is taken from A table
    coalesce(d.Status, A.Status) as Status, -- If Status from Delta table is not null it is taken otherwise Status from A table is taken
    coalesce(d.Name, A.Name) as Name        -- If Name from Delta table is not null it is taken otherwise Name from A table is taken 
FROM A
LEFT JOIN (SELECT                           -- Left outer join to don't filter out records without updates
                B.Status, 
                B.Name, 
                ROW_NUMBER() OVER (PARTITION BY Person_id ORDER BY timestamp DESC) as rn 
            FROM B) d ON (A.Person_id = d.Person_id and d.rn = 1);

Hive : inner join to update "table a" from "table b" where values in "table b" is different from "table a" ? create a last view (a+b)

1 Answers1