Why does setting TTL to 1 second lead to unexpected behaviour?

Question

I know that setting very short TTL in Cassandra is a bad practice, but I want to have a deeper understanding of why it is the case. I have a table structure with primary key consisting of 2 fields. When writing a record to Cassandra and setting a TTL of 1, a lot of records end up preserving data only for primary key fields, while others are end up being NULL. This is a tricky situation in Cassandra since it is hard to query on NULL values and clean such records up.

What is the safe value for TTL in Cassandra? (I assume it depends on replication configuration) How does one clean up the table, when having millions of rows with NULL data

Welcome to Stack Overflow! A friendly reminder that this site is for getting help with coding, algorithm, or programming language problems so I voted to have your post moved to [DBA Stack Exchange](https://dba.stackexchange.com/questions/ask?tags=cassandra). For future reference, you should post DB admin/ops questions on https://dba.stackexchange.com/questions/ask?tags=cassandra. Cheers! — Erick Ramirez, May 04 '23 at 10:14
What exactly is the use case here? If we need to set such a short TTL value, why bother inserting it in first place? Could you please provide additional context here of the problem that you're trying to solve for? — Madhavan, May 04 '23 at 12:18

score 1 · Answer 1 · answered May 03 '23 at 14:58

The reason a short TTL is bad is that you open yourself up to reading a ton of tombstones. And what most people do when they read a lot of tombstones, is they drop the gc_grace_seconds, which then opens you up to possible data resurrection.

A valid, or good TTL, should be one that helps with whatever issue you're covering, but also takes into account that if I read that data, will I read 200 tombstones or 20,000. If you never read the data back, then it probably won't matter at all, other than you might be taking up a lot of physical disk for deleted data.

score 1 · Answer 2 · answered May 04 '23 at 10:13

When you write data with a TTL, two things happen:

the data is inserted into the table, and
a tombstone is inserted marking the data for deletion at a future date.

Since Cassandra has a distributed architecture with nothing shared between nodes, managing deletions is a bit complicated. Cassandra stores the tombstones in memory so that when the data is requested by an application, the coordinator will not return data which is already expired.

If a cluster is busy, a very low TTL means that the tombstones pile up quickly in memory which can significantly affect the performance of the cluster.

Additionally if the table has clustering columns (each partition contains 1 or more rows), it possible that Cassandra has to iterate over lots of deleted rows before it can get to the live ones (depending on your data model). This can lead to a tombstone overwhelm exception and cause requests to timeout.

I would recommend having a look at How data is deleted in Cassandra to get a good understanding of the subject. Cheers!

score -1 · Answer 3 · answered May 04 '23 at 09:39

Answering this based on what I found on the web myself

Setting a TTL (Time To Live) of only one second can lead to unexpected behavior in Cassandra due to the distributed nature of the database. When data is written to a Cassandra node with a TTL of one second, it is immediately marked for deletion after that time has elapsed. However, it may take some time for this deletion to propagate to all nodes in the cluster, which can lead to inconsistencies in the data.

In this case, it seems that some nodes in the cluster may have received the delete signal before others, resulting in some records being deleted while others were not. This can lead to situations where only the primary key fields are preserved, while the rest of the data is null.

The safe value for TTL in Cassandra depends on the specific use case and data retention requirements. In general, it is recommended to use TTL values of at least a few minutes, to allow enough time for the delete signal to propagate to all nodes in the cluster.

Why does setting TTL to 1 second lead to unexpected behaviour?

3 Answers3