On cassandra, we only need 100 days of data for specific tables. However, we only recently set the TTL value and the data older than that still stays in the system as stale data. We were thinking of different approaches to delete the old data out of the system. One suggestion was to create a Spark job to identify the data older than a specific timeframe and delete them all. Another thought was to create a new table with just 100 days data and delete the old table. But I have various doubts on
- how to rename the table where live data is being updated,
- how will cassandra deal with such a table? While I have recreated a new table with less data and renamed it on one node(say node 1), will the other nodes in the cluster automatically delete the older data in their tables or sync the table on the node 1 and push all the older data onto it?
I am really new to cassandra and require expert advice on this. Please suggest if there are better ways to handle this.