0

I have an existing table with these columns: record key is emp_id, precombine is log_ts and partition is log_dt

emp_id  emp_name                log_ts            load_ts             log_dt
1           neo         2023-08-04 12:00:00  2023-08-04 12:00:00    2023-08-04

The incoming batch will look like this( emp_id 1 is updated, emp_id 2 is brand new record)

emp_id       emp_name                     log_ts                               
1                neo                2023-08-05 14:00:00 
2               trinity             2023-08-05 14:00:00

My desired output should look like below. For the first record, log_ts & log_dt is updated, load_ts shouldn't be changed because it captures when was the first time the record was loaded. For the second record, it should be a insert along with the current load_ts.

emp_id         emp_name                   log_ts           load_ts              log_dt
1                neo             2023-08-05 14:00:00   2023-08-04 12:00:00     2023-08-05
2               trinity          2023-08-05 14:00:00   2023-08-05 15:00:00     2023-08-05

Can PartialUpdateAvroPayload be used to achieve both the scenarios ( new insert with all columns, upsert with only few columns)? If yes, how to set the properties to achieve this?

praneethh
  • 263
  • 4
  • 16

0 Answers0