I am working on a spark using Java, where I will download data from api and compare with mongodb data, while the downloaded json have 15-20 fields but database have 300 fields.
Now my task is to compare the downloaded jsons to mongodb data, and get whatever fields changed with past data.
Sample data set
Downloaded data from API
StudentId,Name,Phone,Email
1,tony,123,a@g.com
2,stark,456,b@g.com
3,spidy,789,c@g.com
Mongodb data
StudentId,Name,Phone,Email,State,City
1,tony,1234,a@g.com,NY,Nowhere
2,stark,456,bg@g.com,NY,Nowhere
3,spidy,789,c@g.com,OH,Nowhere
I can't use the except, because of column length.
Expected output
StudentId,Name,Phone,Email,Past_Phone,Past_Email
1,tony,1234,a@g.com,1234, //phone number only changed
2,stark,456,b@g.com,,bg@g.com //Email only changed
3,spidy,789,c@g.com,,