1

Recently I have gone through a tutorial about key cache and row cache. Can anyone help me with some real time examples where these caches can impact? And what is the impact if we increase these values in the config file?

On using desc table I found this

AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
Prashant Pimpale
  • 10,349
  • 9
  • 44
  • 84
  • Here is what I have seen: The partition cache, if possible, should be able to contain all values. I've seen partition misses cause extra undesirable latency. Now for the row-cache, I've seen real-life examples of having this set causing major slow-downs. I have not really been able to explain why, but I did open a ticket with datastax and they told me to turn it off (because of that very reason). When I disabled it, things ran much better. I'm sure there are some cases where the RC are worth it, but haven't found one yet for us. – Jim Wartnick Mar 22 '19 at 15:21

1 Answers1

0

Your main concern is the memory profile of your application.

This diagram demonstrates how the key cache optimises the readpath, it allows us to skip the partition summary and partition index, and go straight to the compression offset. As for the row cache, if you get a hit, you've got your answer and don't need to go down the read path at all.

enter image description here

Key cache - The key cache is on by default as it only keeps the key of the row. Keys are typically smaller relative to the rest of the row so this cache can hold many entries before it's exhausted.

Row cache - The row cache holds an entire row and is useful when you have a fairly static querying pattern. The argument for the row cache is that if you read the same rows over and over, you can just keep them in memory rather going to the SSTable (storage medium) level and thus bypass an expensive seek on the read path. In practice the memory slow downs caused by usage of the row cache in non-optimal use-cases makes it an unpopular feature.

So what happens if you fill up the cache? Well, there's an eviction policy but if you're constantly kicking stuff out of either cache to make room for new items, then the caches won't exactly be useful as the gc related performance degradation will hurt overall performance.

What about having very high cache values? This is where there are better alternatives, more on this later. Making the row cache huge would just lead to GC issues, which depending on what you're doing exactly, typically leads to an overall net-loss in performance.

One idea I've seen being utilised relatively well is having a caching layer on top of Cassandra, such as Apache Ignite or Memcached. You load hot data in the caching layer to get fast READs and you write with an application that writes to the cache layer then to C* for persistence. These architectures come with many of their own headaches but if you want to cache data for lower query latencies, the C* row cache isn't the best tool for the job.

Lyuben Todorov
  • 13,987
  • 5
  • 50
  • 69