I have a very huge Cassandra table with about 13 million entries. This table serves as a kind of a lookup table. That means there are no writes but only reads. I use Datastax Enterprise 4.8
(including Cassandra 2.1
).
So, the content is very static, but from time to time (every few month) there is an update of the content. The problem is, that the old data can become outdated and new data appears. But the old data won't be overwritten (it stays in the table). It is necessary to remove the old data to have a clean database.
I have one requirement ... the database must be available during the update. It is okay to have a short time period (a few minutes) where old and new data exists side by side.
I already thought about the following solutions:
- Write the new table directly as a SSTable and exchange it with the old one
- Do the update as batch with an truncate of the old data at the beginning
- Create a new table (with new name) and change the used table in the program (while running)
- Add a version column, add new data with new version and delete old data (with old version) afterwards
Which of these solution is the best one? Or even better, is there a solution that solves my problem much more elegant?