I am always inserting data PRIMARY KEY ((site_name,date),time,id) while the site_name and date can be same the time which is a tamed field and id(uuid) is different. So I always add new data. Data is inserted with TTL (Currently 3 days). So as i don't delete or update can I disable compaction? Considering TTL is there. Would it effect anything. Also as no record is deleted can i disable gc_grace time? I wanna put as much less load on the servers as possible. Much appreciate if anyone can help ?
3 Answers
TTLs create tombstones. As such, compaction is required. If your data is time series data, you might consider the new date tiered compaction: http://www.datastax.com/dev/blog/datetieredcompactionstrategy .
If you use TTLs and set grace to 0, you're asking for trouble unless your cluster is a single node one. the grace is the amount of time to wait before collecting tombstones. If it's 0, it won't wait. This may sound good, but in reality, it'll mean the "deletion" might not propagate across the cluster, and the deleted data may re-appear (coz other nodes may have it, and the last present value will "win"). This type of data is called zombie data. Zombies are bad. Don't feed the zombies.
You can disable auto compaction: http://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsDisableAutoCompaction.html . But again, I doubt you'll gain much from this. Again, look at date tiered compaction.
-
I will definitely look into datetiercompaction strategy. If I set grace to 0 then only zombie data would be created. In my case I don't mind if the data is old and still there. I only care about the daily sites (thats why the date in the compound primary key). I only read data (count of websites on that date). So in my case setting the grace period is beneficial or not ( I can run node repair tool once every week). Also I want to experiment with key cache. Changing its size etc. Whats the best way to check if the performance improved or decreased. Thanks a lot. – Mark Jan 08 '15 at 02:15
-
Zombies won't get deleted. And reincarnated data may eventually be propagated as new data with repairs. Compaction isn't only about deletes. With many storage files, your reads may need to hit many of them. Compaction reduces this effect. So without it, reads may get slower. – ashic Jan 08 '15 at 07:14
-
Increasing key cache can help, as might increasing bloom filters. You can use stresstool and compare numbers. I believe the latest stresstool allows you to specify your own schema, etc. – ashic Jan 08 '15 at 07:15
-
ashic: if all of the row has the same TTL across a cluster, and the data is not touched after it has been inserted, all nodes will hit TTL at same time, thus not needing grace. Remember - afaik grace 0 does actually create tombstones, only that they are always deleted in the first compaction. SSTables are always immutable. – polve Jan 08 '15 at 08:40
-
1Yes...in fact if TTL > gc grace, then tombstones aren't created at all. https://issues.apache.org/jira/browse/CASSANDRA-4917 However, compaction is still necessary to reclaim disk space, coz as you say, SSTables are immutable. – ashic Jan 08 '15 at 10:03
-
Good that we are in agreement, and to clarify I have never suggested turning of compaction, only setting gc grace to zero when using TTL and no updates/only inserts. – polve Jan 08 '15 at 14:11
you can permanently disable autocompaction on tables (column families) separately, like this (cql)
alter table <tablename> with compaction = { 'class':'CompactionStrategy', 'enabled':'false'}
the enabled:false permanently disables autocompaction on that table, but you can do manual compaction whenever you like using 'nodetool compact' command

- 1,537
- 13
- 9
You can set gc grace to 0, but not turn off compaction. If you never delete or update I think you might be able to turn off compaction.
Edit: Optimizations in C* from 2.0 and onwards exactly for this case: https://issues.apache.org/jira/browse/CASSANDRA-4917
About TTL, tombstones and GC Grace http://mail-archives.apache.org/mod_mbox/cassandra-user/201307.mbox/%3CCALY91SNy=cxvcGJh6Cp171GnyXv+EURe4uadso1Kgb4AyFo09g@mail.gmail.com%3E

- 2,789
- 2
- 18
- 20
-
So setting gc grace to 0 would improve my read in my current situation ? Thats what I am hoping to achieve. – Mark Jan 07 '15 at 17:40
-
You can turn off auto compaction, but it won't really help in this case. TTLs do create tombstones, and so it won't be a good idea to turn off grace unless it's a single node cluster. – ashic Jan 07 '15 at 23:22
-
Its a eight node cluster. I only care about a websites total count on daily bases. Even if zombie data is created it won't effect my read cause I am not reading the data with old date. Also will compaction effect anything. As i am not updating anything. Always inserting. – Mark Jan 08 '15 at 02:18
-
This will improve read as you will have less tombstones lying aorund needing fewer SSTable searches. Learn how to read the result from 'nodetool cfhistograms '. Check the 'SSTables per Read' stat, it describes how many sstables a read must go through. – polve Jan 08 '15 at 08:47
-
compaction though, is not a very intelligent task so far, especially in my case as my hard-drives are already trickling with data, it's useless. I hope that in future versions compaction is optimized – Aurangzeb Apr 27 '18 at 07:06