1

I have two tables (table a and table b) in Hive.

The table a is an init table (full). The table b is a delta table with the latest updates (this table has the same columns and 5 more columns including a timestamp column for the last updates).

I want to create a last view (join between full + delta) from the two tables to have the last updates.

1- The first step I did is to select the rows with the max(timestamp) from table b (delta) to have last updates

2- Then I used it like a subquery in an inner join :

SELECT full.*, delta.* 
FROM table a FULL 
INNER JOIN (SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY timestamp 
            DESC) as rn
            FROM b) delta 
ON f.id = d.id and d.rn = 1

The problem is that I have to specify the columns I want to use in the select and I will have duplicated columns with different values when table b is updated.

I need a condition to always select columns from table a and replace columns in table a with columns in table b when values in table b is different from values in table a.

Any solutions please ?

Please find the data simple image in the link

Lyashko Kirill
  • 513
  • 3
  • 14
  • Make it easy to assist you, and you will get better answers: [mcve]. – jarlh Jan 20 '20 at 11:15
  • Difference in what columns are you interested in? – Lyashko Kirill Jan 20 '20 at 11:50
  • Please provide sample data and desired results. It is unclear what you are trying to accomplish. – Gordon Linoff Jan 20 '20 at 12:31
  • I just added the data simple @GordonLinoff – samsunglolipop Jan 20 '20 at 15:55
  • @LyashkoKirill The table "a" and table "b" has the same columns except that the table b has 4 more columns for the sequence_id, update_date, etc When I use an inner join I have to specify which columns I want in the SELECT clause and since the columns are the same they will be duplicated with different values or same if there is no update – samsunglolipop Jan 20 '20 at 15:58

1 Answers1

0

Seems like you are looking for COALESCE or NVL functions. In that case your query may look like:

SELECT
    A.Person_id as Person_id,               -- Person_id is taken from A table
    A.Training_id as Training_id,           -- Training_id is taken from A table
    coalesce(d.Status, A.Status) as Status, -- If Status from Delta table is not null it is taken otherwise Status from A table is taken
    coalesce(d.Name, A.Name) as Name        -- If Name from Delta table is not null it is taken otherwise Name from A table is taken 
FROM A
LEFT JOIN (SELECT                           -- Left outer join to don't filter out records without updates
                B.Status, 
                B.Name, 
                ROW_NUMBER() OVER (PARTITION BY Person_id ORDER BY timestamp DESC) as rn 
            FROM B) d ON (A.Person_id = d.Person_id and d.rn = 1);
Lyashko Kirill
  • 513
  • 3
  • 14