So I was playing around seeing what could be achieved using Data Migration Service Chance Data Capture, to take data from MSSQL to S3 & also Redshift.
The redshift testing was fine, if I delete a record in my source DB, a second or two later the record disappears from Redshift. Same with Insert/update etc ..
But S3 ...
You get the original record from the first full load.
Then if you update a record in source, S3 receives a new copy of the record, marked with an 'I'.
If I delete a record, I get another copy of the record marked with a 'D'.
So my question is - what do I do with all this ? How would I query my S3 bucket to see the 'current' state of my data set as reflecting the source database ?
Do I have to script some code myself to pick up all these files and process them, performing the inserts/updates and deletes until I finally resolve back to a 'normal' data set ?
Any insight welcomed !