I am trying to implement a logic in Redshift Spectrum where my original table looks like below:
Records in the student table:
1 || student1 || Boston || 2019-01-01
2 || student2 || New York || 2019-02-01
3 || student3 || Chicago || 2019-03-01
1 || student1 || Dallas || 2019-03-01
Records in the incremental table studentinc looks like below:
1 || student1 || SFO || 2019-04-01
4 || student4 || Detroit || 2019-04-01
By joining both student and studentinc tables, I am trying to get the latest set of records which should look like below:
2 || student2 || New York || 2019-02-01
3 || student3 || Chicago || 2019-03-01
1 || student1 || SFO || 2019-04-01
4 || student4 || Detroit || 2019-04-01
I have got this solution by doing UNION of both student and studentinc, then querying the result of union based on max(modified_ts). However, this solution isn't good for huge tables, is there a better solution which works by joining both the tables?