In a single-node installation of Cassandra 3.7 on a VM running Debian, I have a table with about 20 million rows. In order to be able to select the data that was inserted during the last few days, I used Datastax DevCenter 1.6.0 to execute a statement to create an secondary index on a column containing the insertion date:
CREATE INDEX srsdata_datetimeinserted ON ccp.srsdata(datetimeinserted);
The statement itself ran quickly, and then, as I understand it, the index creation process started in hte background, with a close-to 100% CPU load on one of the cores. The problem is, this CPU load has been high now for over 24 hours and starts again even after multiple restarts of the virtual machine.
In order to check the index creation process, I have run
nodetool compactionstats
but almost from the beginning it seems to have stuck at 5.78% and hasn't changed at all during the last 24 hours:
pending tasks: 1
- ccp.srsdata: 1
id compaction type keyspace table completed total unit progress
2616e5d0-c217-11e6-bbed-073889a74ba2 Secondary index build ccp srsdata 655814 11350989 bytes 5.78%
Active compaction remaining time : 0h00m00s
I can SELECT from tables but not INSERT data, not even into other tables, I then get
"Cassandra timeout during write query at consistency ONE
(1 replica were required but only 0 acknowledged the write)"
If I try to drop the index,
DROP INDEX srsdata_datetimeinserted;
I get
"Timed out waiting for server respones".
I have tried to stop the index building using
nodetool stop INDEX_BUILD
but it doesn't make any difference.
What can I do to stop and maybe restart the index creation? Or is there some other stuff running that I haven't though about?
Update 2017-01-12
I never got the index creation process to stop, so I ended up restoring the virtual server from a backup taken prior to creating the index.
I also found out about the new SASI indexes introduced in Cassandra 3.4 (http://www.doanduyhai.com/blog/?p=2058), and especially the SPARSE indexing mode is made for storing close-to-unique data such as millisecond timestamps. In fact, a maximum of 5 identical values is allowed. So I creataed a SASI index using
CREATE CUSTOM INDEX srsdata_datetimeinserted ON ccp.srsdata (datetimeinserted) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'SPARSE' };
The creation took about 20 minutes and seems to work fine, now I can make queries like
select * from ccp.srsdata where datetimeinserted >= '2017-01-01 00:00:00+0000' AND datetimeinserted < '2017-01-01 15:00:00+0000';