I know Cassandra count() is an expensive operation as it needs a complete table scan. https://www.datastax.com/blog/running-count-expensive-cassandra
But let's say, we have a table hotel
with hotel_type
as the partition key and we run query
select count(*) from hotel where hotel_type= 'luxury';
Will this be expensive too? Actually, I need to run 1 million queries like this to get the count of different hotel_types. Will finding count can impact the prod Cassandra cluster?
Update:
I saw we can use dsbulk
for the counting. How dsbulk count is different from cql count()?