Is it possible to backup and restore Cassandra cluster using dsbulk?

Question

I searched through the internet a lot and saw a lot of ways to backup and restore a Cassandra cluster, such as nodetool snapshot and Medusa. but my question is that can I use dsbulk to backup a Cassandra cluster. What are its limitations? Why doesn't anyone suggest that?

score 4 · Accepted Answer · answered Sep 28 '21 at 17:33

It's possible to use it in some cases, but it's not practical because (that are primary, list could be bigger):

DSBulk put an additional load onto the cluster nodes because it's going through the standard read path. In contrast to that nodetool snapshot just create a hardlinks to the files with data, no additional load to the nodes
It's harder to implement incremental backups with DSBulk - you need to come with condition for SELECT that will find only data that changed since the last backup, so you need to have timestamp column, because you can't do the WHERE condition on the value of writetime function. Plus it will require rescanning of whole data anyway. Plus it's impossible to find what data were deleted. With nodetool snapshot, you just compare what files has changed since last backup, and backup only them.

What is your suggestion for backup of Cassandra cluster? is it better to use `Medusa` or native `nodetool snapshot`? (or even another better tool?) — Mostafa Bayat, Sep 29 '21 at 07:05
I personally would go with Medusa. `nodetool snapshot` will work, but you'll need to implement all copying, etc. yourself — Alex Ott, Sep 29 '21 at 07:41

Is it possible to backup and restore Cassandra cluster using dsbulk?

1 Answers1