Cassandra: high CPU usage and unresponsive DB, probably due to a stuck secondary index building - how to stop the index building process?

Question

In a single-node installation of Cassandra 3.7 on a VM running Debian, I have a table with about 20 million rows. In order to be able to select the data that was inserted during the last few days, I used Datastax DevCenter 1.6.0 to execute a statement to create an secondary index on a column containing the insertion date:

CREATE INDEX srsdata_datetimeinserted ON ccp.srsdata(datetimeinserted);

The statement itself ran quickly, and then, as I understand it, the index creation process started in hte background, with a close-to 100% CPU load on one of the cores. The problem is, this CPU load has been high now for over 24 hours and starts again even after multiple restarts of the virtual machine.

In order to check the index creation process, I have run

nodetool compactionstats

but almost from the beginning it seems to have stuck at 5.78% and hasn't changed at all during the last 24 hours:

pending tasks: 1
- ccp.srsdata: 1

id                                   compaction type       keyspace table   completed total    unit  progress
2616e5d0-c217-11e6-bbed-073889a74ba2 Secondary index build ccp      srsdata 655814    11350989 bytes 5.78%
Active compaction remaining time :   0h00m00s

I can SELECT from tables but not INSERT data, not even into other tables, I then get

"Cassandra timeout during write query at consistency ONE 
(1 replica were required but only 0 acknowledged the write)"

If I try to drop the index,

DROP INDEX srsdata_datetimeinserted;

I get

"Timed out waiting for server respones".

I have tried to stop the index building using

nodetool stop INDEX_BUILD

but it doesn't make any difference.

What can I do to stop and maybe restart the index creation? Or is there some other stuff running that I haven't though about?

Update 2017-01-12

I never got the index creation process to stop, so I ended up restoring the virtual server from a backup taken prior to creating the index.

I also found out about the new SASI indexes introduced in Cassandra 3.4 (http://www.doanduyhai.com/blog/?p=2058), and especially the SPARSE indexing mode is made for storing close-to-unique data such as millisecond timestamps. In fact, a maximum of 5 identical values is allowed. So I creataed a SASI index using

CREATE CUSTOM INDEX srsdata_datetimeinserted ON ccp.srsdata (datetimeinserted) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'SPARSE' };

The creation took about 20 minutes and seems to work fine, now I can make queries like

select * from ccp.srsdata where datetimeinserted >= '2017-01-01 00:00:00+0000' AND datetimeinserted < '2017-01-01 15:00:00+0000';

I'd try modeling your table to *not* use a secondary index. And secondary indexes on high-cardinality columns (like date time) are an anti-pattern. Even if this index finished building, your queries on it would perform very badly. — Aaron, Dec 15 '16 at 13:44
I'm aware of that it is not a good pattern, but it would be a query that would run seldom, once in a few weeks or so, and I would really like to avoid creating and maintaining a separate table just for this query. I don't really need to query by exact timestamp on millisecond level, so I guess if I had a date column (YYYY-MM-DD) that I could index would be a better option with significantly less cardinality? — maccer, Dec 19 '16 at 20:26

Cassandra: high CPU usage and unresponsive DB, probably due to a stuck secondary index building - how to stop the index building process?

0 Answers0