I searched through the internet a lot and saw a lot of ways to backup and restore a Cassandra cluster, such as nodetool snapshot
and Medusa
. but my question is that can I use dsbulk
to backup a Cassandra cluster. What are its limitations? Why doesn't anyone suggest that?
Asked
Active
Viewed 216 times
3

Mostafa Bayat
- 93
- 6
1 Answers
4
It's possible to use it in some cases, but it's not practical because (that are primary, list could be bigger):
- DSBulk put an additional load onto the cluster nodes because it's going through the standard read path. In contrast to that
nodetool snapshot
just create a hardlinks to the files with data, no additional load to the nodes - It's harder to implement incremental backups with DSBulk - you need to come with condition for SELECT that will find only data that changed since the last backup, so you need to have timestamp column, because you can't do the WHERE condition on the value of
writetime
function. Plus it will require rescanning of whole data anyway. Plus it's impossible to find what data were deleted. Withnodetool snapshot
, you just compare what files has changed since last backup, and backup only them.

Alex Ott
- 80,552
- 8
- 87
- 132
-
What is your suggestion for backup of Cassandra cluster? is it better to use `Medusa` or native `nodetool snapshot`? (or even another better tool?) – Mostafa Bayat Sep 29 '21 at 07:05
-
1I personally would go with Medusa. `nodetool snapshot` will work, but you'll need to implement all copying, etc. yourself – Alex Ott Sep 29 '21 at 07:41