Our use case is load bulk data into our live production Cassandra cluster. We have to load bulk data in Cassandra on daily basis. We came across sstableloader. We have few queries around same:
1: When we are loading bulk data into our live production cluster using sstableloader, do we have a chance of dirty read?(Basically does sstableloader load all data at once or it continues to update as and when it is getting data?) Dirty read is not acceptable in our production environment.
2: When we are loading bulk data into our live production cluster, does it affect cluster availability?(Basically since we are loading a huge amount of data into live production cluster, does it affect its performance? Do we need to increase cluster nodes for making it highly available during bulk loading?)
3: If there is possibility of dirty read in live production cluster using sstableloader, please suggest alternate tool which can avoid this issue. We want all bulk data to appear at once and not incremental.
Thanks!