Is it possible to avoid tombstone problems with Cassandra?

Question

I am writing code for a CMS using Cassandra as the database system.

One of the strength of the CMS is to pre-calculate all sorts of things using a backend computer that permanently runs against data that changes in the CMS.

For example, the CMS tells the list system that a page was created or changed. The list system saves that information in a table called list. That information is just a one liner which tells me which page has to be worked on.

Column family: list
   Row: concerned website (i.e. http://www.example.com/)
     Column: full URI (i.e. http://www.example.com/this/page)
        Value: true (because you need something for the column to exist)

Once in a while (most often less than a second after a simple page edit), that list backend system wakes up and sees that a certain page changed and starts working on it by updating all the lists that include (or do not include anymore) that page as an element. This allows the front end to instantly know the number of elements in a list and to read lists very quickly without running complex queries at the time the list is needed (opposed to what many CMS do using SQL...)

In effect, I am using the list table as a TODO list. A set of pages I have to work on. So the front end adds page references to that list, and the backend deletes them once done with them. As a result I can end up with a very large number of tombstones in the list table. The real world effect: I had tombstone failures and the system started failing in random places. And once when the list stops working, many other things in the system stop working and the websites become unusable.

I decreased the time it takes Cassandra to take care of tombstones in that specific table (and a few others) but I am wondering whether I'm using Cassandra as expected. Whether there is a better way to handle a TODO list of this sort in this environment?

As a side note: the TODO list may be worked on from various different backend computers. On a small system, you are likely to have only one backend running against the list data, on larger systems with thousands of users, you are not unlikely to have 2 or 3 backends just to handle lists. So having the data in Cassandra is very practical to share it quickly between computers.

If writing a new application should probably avoid thrift, its deprecated. — Chris Lohfink, Mar 27 '16 at 01:04
@ChrisLohfink, I started with Cassandra 0.8, but we are working on getting CQL with Cassandra 3.x instead of thrift. That being said, I still would like to know whether sorting is working differently or not... — Alexis Wilke, Mar 27 '16 at 01:40

score 3 · Accepted Answer · answered Mar 27 '16 at 02:30

3

You essentially implemented a queue which is considered an anti-pattern for cassandra: http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets

There are work arounds and things people do to make them better but its a hard game to play. Be sure to use LeveledCompactionStrategy and not the default, this will help a lot in smaller workloads. Consider the work arounds like time boxing the partitions (rows in old thrift terminology) and whats in the article linked above but you may want to look for a different solution.

answered Mar 27 '16 at 02:30

Chris Lohfink

16,150
1
29
38

*"The queue example might be extreme"* -- except that's exactly what is problematic for us... our sessions table has a similar problem, albeit not as bad than a real full queue. – Alexis Wilke Mar 27 '16 at 03:06
Lowering your gc_grace_seconds may be good idea too, but setting to zero is bad since you could lose deletes. – Chris Lohfink Mar 27 '16 at 03:14
Yeah, I put it at 3600 for a few tables... at this point, it does not seem to cause problems, but we'll have to see how it goes with 3.x once we have that in place. – Alexis Wilke Mar 27 '16 at 04:28
@AlexisWilke a suggestion for your session table. Assuming that you write the complete session state to the table each time: Do immutable inserts by adding a timeuuid clustering column ordered with the newest change first, where the timeuuid is the current time of the update. Then when you want the current Session state you can do a LIMIT 1 and get the current value (may have to tune for consistency levels). To clean up data you could probably use a TTL on the PK for the session too. Avoids your tombstone issues and having to read multiple SSTables to get the latest session data. – fromanator Mar 28 '16 at 05:16
Actually I think I'm good with sessions because I always have the row key so I do not have to query a slice. I can read the data in one query and I would imagine tombstones are fine here. Only it can grow quite a bit and my worry on this one would be that I reach a point where Cassandra decides to stop compacting because it has too many tombstone. Then the table would continue to grow forever... – Alexis Wilke Mar 28 '16 at 05:43

Is it possible to avoid tombstone problems with Cassandra?

1 Answers1

Linked