Cassandra (DSE) - Need suggestion on using PER PARTITION LIMIT on huge data

Question

I have a table with around 4M of partitions and each partition contains 4 rows. So, the total data in table would be having 16M rows (wide columns). Since our table is a time series database, we only need the latest row or version of the partition_key. I can achieve my desired results through below query. However this will impact load on clusters and time consuming. Would like to see if we have any other best way to achieve this or this is the only way.

SELECT some_value FROM some_table PER PARTITION LIMIT 1;

Did you consider possibility to create bigger partitions, not only 4 rows in partition in order to optimize load on cluster? Of course, then some other mechanism will be needed for limiting output, but load can be smaller. — Matus Danoczi, Jan 10 '20 at 19:35

score 0 · Answer 1 · answered Jan 10 '20 at 06:36

0

Using PER PARTITION LIMIT won't have an impact on performance. In fact, it's efficient for achieving what you need from each partition since only the first row will be returned and it doesn't to iterate over the other rows in the partition. Cheers!

answered Jan 10 '20 at 06:36

Erick Ramirez

13,964
1
18
23

Cassandra (DSE) - Need suggestion on using PER PARTITION LIMIT on huge data

1 Answers1