0

We have a 5 node cassandra cluster in our production. All running Cassandra 2.0.6. The cluster stores user interactions in pages in a column family. The data model looks like

Row Key:
20140101:http://example.com/myurlpath?myquery=1

Columns:
Counters
X:Y:Type => Counter Value

Since it is kind of stream of data points. We have a separate cron that actively deletes rows [remove all columns] that are more than n weeks old. Although our deletion cron empties older rows. The row keys still stay in our system [Ex: There is still a rowkey with timestamp 20130517].

I Checked SO Posts here and here Also cassandra forum There is no clear resolution out of the answers. I understand distributed deletes and tombstones. But this row keys issue remains still a myth for me.

I tried forcing a major compaction and a cleanup nothing changed things. Because of this memory used by our cassandra cluster is constantly increasing, as our row key sizes are high [120B on an average].

We have let gc_grace settings of column families stay the default 10 days. If it is the issue at least we should not see row keys older than an year [very frequently present] at max a month or two is fine.

How should we manage row key removal in cassandra?

Community
  • 1
  • 1
Tamil
  • 5,260
  • 9
  • 40
  • 61

3 Answers3

0

Use NodeTool utility command line interface of Cassandra for managing the cluster.

CLI:

1: CD C:\Program Files\DataStax Community\apache-cassandra\bin

2: nodetool -h localhost flush KeySpace Table

Wait 2-3 minutes. The Magical Wait

nodetool -h localhost compact KeySpace Table

ʰᵈˑ
  • 11,279
  • 3
  • 26
  • 49
Robin Jain
  • 65
  • 8
0

I am using nodetool for my Cassandra server maintenance. It works fine for me. For this you need to use flush,cleanup and repair utility. You have to write shell script and execute the script using cronjob.

#!/usr/bin/env bash
. /etc/rc.d/init.d/functions
nodetool flush [keyspace] [cfnames]
nodetool invalidatekeycache [keyspace] [cfnames]
nodetool invalidaterowcache [keyspace] [cfnames]
nodetool scrub [keyspace] [cfnames]
nodetool repair [keyspace] [cfnames]
nodetool cleanup [keyspace] [cfnames]
nodetool compact [keyspace] [cfnames]

Reference link is: NodeTool

Rajesh Ujade
  • 2,715
  • 19
  • 39
0

http://www.slideshare.net/planetcassandra/8-axel-liljencrantz-23204252

As the above presentation says, cassandra won't delete a row key if it is present in multiple sstables. Although the process of compaction exists for same purpose, there is always a possibility of this not happening [From Slide 35].

Tamil
  • 5,260
  • 9
  • 40
  • 61