I have two tables (table a and table b) in Hive.
The table a is an init table (full). The table b is a delta table with the latest updates (this table has the same columns and 5 more columns including a timestamp column for the last updates).
I want to create a last view (join between full + delta) from the two tables to have the last updates.
1- The first step I did is to select the rows with the max(timestamp) from table b (delta) to have last updates
2- Then I used it like a subquery in an inner join :
SELECT full.*, delta.*
FROM table a FULL
INNER JOIN (SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY timestamp
DESC) as rn
FROM b) delta
ON f.id = d.id and d.rn = 1
The problem is that I have to specify the columns I want to use in the select and I will have duplicated columns with different values when table b is updated.
I need a condition to always select columns from table a and replace columns in table a with columns in table b when values in table b is different from values in table a.
Any solutions please ?