I need to crack a spark query, Let say I have my data in delta table(tab) like:-
cust date acct_id f1 f2 source_date(dd/mm/yy:h)
b1 1/10/22 acc1 x y 9/9/22:1 P.M
b1 1/10/22 acc2 x y 9/9/22:1 P.M
b1 1/10/22 acc3 x y 9/9/22:1 P.M
Now I am sourcing changes from a db and just appending to my delta table, for example , if at any instant below are the changes I am getting from db(using kafka or any other message broker) using a scheduler which runs every 2 minutes:-
changes:-
b1 1/10/22 acc2 p y 10/9/22:2 P.M
b1 1/10/22 acc2 p q 11/9/22:4 P.M
b1 1/10/22 acc3 m n 11/9/22:4 P.M
then I will run a spark job to just append the changes in my delta table and my delta table will look like :-
cust date acct_id f1 f2 source_date(dd/mm/yy:h)
b1 1/10/22 acc1 x y 9/9/22:1 P.M
b1 1/10/22 acc2 x y 9/9/22:1 P.M
b1 1/10/22 acc3 x y 9/9/22:1 P.M
b1 1/10/22 acc2 p y 10/9/22:2 P.M
b1 1/10/22 acc2 p q 11/9/22:4 P.M
b1 1/10/22 acc3 m n 11/9/22:4 P.M
now I want to serve the below queries using spark:-
query: get data for cust=b1,date= 1/10/22, source_date=9/9/22:1 P.M
output:-
b1 1/10/22 acc1 x y 9/9/22:1 P.M
b1 1/10/22 acc2 x y 9/9/22:1 P.M
b1 1/10/22 acc3 x y 9/9/22:1 P.M
query: get data for cust=b1,date= 1/10/22, source_date=10/9/22:2 P.M
output:-
b1 1/10/22 acc1 x y 9/9/22:1 P.M
b1 1/10/22 acc2 p y 10/9/22:2 P.M
b1 1/10/22 acc3 x y 9/9/22:1 P.M
query: get data for cust=b1, date=1/10/22, source_date=11/9/22:4 P.M
output:-
b1 1/10/22 acc1 x y 9/9/22:1 P.M
b1 1/10/22 acc2 p q 11/9/22:4 P.M
b1 1/10/22 acc3 m n 11/9/22:4 P.M