Cassandra remove row keys after deletion

Question

We have a 5 node cassandra cluster in our production. All running Cassandra 2.0.6. The cluster stores user interactions in pages in a column family. The data model looks like

Row Key:
20140101:http://example.com/myurlpath?myquery=1

Columns:
Counters
X:Y:Type => Counter Value

Since it is kind of stream of data points. We have a separate cron that actively deletes rows [remove all columns] that are more than n weeks old. Although our deletion cron empties older rows. The row keys still stay in our system [Ex: There is still a rowkey with timestamp 20130517].

I Checked SO Posts here and here Also cassandra forum There is no clear resolution out of the answers. I understand distributed deletes and tombstones. But this row keys issue remains still a myth for me.

I tried forcing a major compaction and a cleanup nothing changed things. Because of this memory used by our cassandra cluster is constantly increasing, as our row key sizes are high [120B on an average].

We have let gc_grace settings of column families stay the default 10 days. If it is the issue at least we should not see row keys older than an year [very frequently present] at max a month or two is fine.

How should we manage row key removal in cassandra?

check datetime of all the nodes..I have gone through same error and in my case all the nodes had different datetime — undefined_variable, Sep 14 '14 at 17:07

score 0 · Answer 1 · edited Sep 08 '14 at 10:53

0

Use NodeTool utility command line interface of Cassandra for managing the cluster.

CLI:

1: CD C:\Program Files\DataStax Community\apache-cassandra\bin

2: nodetool -h localhost flush KeySpace Table

Wait 2-3 minutes. The Magical Wait

nodetool -h localhost compact KeySpace Table

edited Sep 08 '14 at 10:53

ʰᵈˑ

11,279
3
26
49

answered Sep 08 '14 at 10:51

Robin Jain

65
8

Isn't this just to clear tombstones in the cassandra columns? – Tamil Sep 12 '14 at 11:25
No, it'll remove the row keys as well. You have to wait between flush & compact operations. – Robin Jain Sep 12 '14 at 17:57

score 0 · Answer 2 · answered Sep 09 '14 at 12:24

I am using nodetool for my Cassandra server maintenance. It works fine for me. For this you need to use flush,cleanup and repair utility. You have to write shell script and execute the script using cronjob.

#!/usr/bin/env bash
. /etc/rc.d/init.d/functions
nodetool flush [keyspace] [cfnames]
nodetool invalidatekeycache [keyspace] [cfnames]
nodetool invalidaterowcache [keyspace] [cfnames]
nodetool scrub [keyspace] [cfnames]
nodetool repair [keyspace] [cfnames]
nodetool cleanup [keyspace] [cfnames]
nodetool compact [keyspace] [cfnames]

Reference link is: NodeTool

score 0 · Accepted Answer · answered Feb 28 '15 at 17:27

http://www.slideshare.net/planetcassandra/8-axel-liljencrantz-23204252

As the above presentation says, cassandra won't delete a row key if it is present in multiple sstables. Although the process of compaction exists for same purpose, there is always a possibility of this not happening [From Slide 35].

Cassandra remove row keys after deletion

3 Answers3