I was reading this article about ETL for analytics databases, and I came across this interesting note:
If you discover that your internal applications are deleting data that’s important for analysis, you have two options: either ask your software engineers to modify the application code to avoid deletions, or implement a data pipeline that includes Change Data Capture (CDC). CDC preserves the state of a database at every point in its history so that, even if data is deleted from the production schema, it is still available for analysis. This solution is often far less invasive than re-architecting an application to avoid deletions.
I'm relatively new to these tools. If I have a ruby on rails app with typical CRUD actions (on a MySql database), instead of re-writing my code to preserve data:
- Could I actually implement something like RJ metrics so I don't need to modify my code but get to keep all my data? If not RJ Metrics,
- Are there services out there that allow me to keep a stream of my data so I don't have to re-write code?