3

Everyone

A few days ago I upgraded our 6 node EC2 cluster from cassandra 2.1.4 to 2.1.5.

Since then, all my nodes have "exploded" in their cpu usage - their at 100% cpu for much of the time and their load average is between 100-300 (!!!).

This did not start immediately after the upgrade. It started a few hours afterwards with one of the nodes, and slowly, more and more nodes began to exhibit the same behavior. It seems to correlate with the compaction of our largest column family, and after the compaction is complete (~24 hours after it starts) it seems the node goes back to normal. It has only been 2 days or so, so I'm hoping it will not happen again, but I am still monitoring this.

Here are my questions

  1. Is this a bug or an expected behavior?

If it is an expected behavior -

  1. What is the explanation for this issue?
  2. Is it documented somewhere that I have missed?
  3. Should I do upgrades differently? Maybe 1 or 2 nodes at a time every 24 hours or so? What is the best practice?

If it is a bug -

  1. Is it known?
  2. Where should I report this? What data should I add?
  3. Will downgrading back to 2.1.4 work?

Any feedback on this would be great

Thanks

Amir

Update:

This is the structure of the table in question.

CREATE TABLE tbl1 (

key text PRIMARY KEY,

created_at timestamp,

customer_id bigint,

device_id bigint,

event text,

fail_count bigint,

generation bigint,

gr_id text,

imei text,

raw_post text,

"timestamp" timestamp

) WITH COMPACT STORAGE

AND bloom_filter_fp_chance = 0.01

AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'

AND comment = ''

AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}

AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}

AND dclocal_read_repair_chance = 0.0

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = 'NONE';

The logs do not reveal much (at least to me). Here's a snippet of how the logs look

INFO [WRITE-/10.0.1.142] 2015-05-23 05:43:42,577 YamlConfigurationLoader.java:92 - Loading settings from file:/etc/cassandra/cassandra.yaml

INFO [WRITE-/10.0.1.142] 2015-05-23 05:43:42,580 YamlConfigurationLoader.java:135 - Node configuration:[authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_snapshot=true; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; broadcast_rpc_address=10.0.2.145; cas_contention_timeout_in_ms=1000; client_encryption_options=; cluster_name=Gryphonet21 Cluster; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_directory=/data/cassandra/commitlog; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_period_in_ms=10000; compaction_throughput_mb_per_sec=16; concurrent_counter_writes=32; concurrent_reads=32; concurrent_writes=32; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; data_file_directories=[/data/cassandra/data]; disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; endpoint_snitch=GossipingPropertyFileSnitch; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; incremental_backups=false; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; inter_dc_tcp_nodelay=false; internode_compression=all; key_cache_save_period=14400; key_cache_size_in_mb=null; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; memtable_allocation_type=heap_buffers; native_transport_port=9042; num_tokens=16; partitioner=RandomPartitioner; permissions_validity_in_ms=2000; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_timeout_in_ms=10000; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=0.0.0.0; rpc_keepalive=true; rpc_port=9160; rpc_server_type=sync; saved_caches_directory=/data/cassandra/saved_caches; seed_provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider, parameters=[{seeds=10.0.1.141,10.0.2.145,10.0.3.149}]}]; server_encryption_options=; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=true; storage_port=7000; thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; write_request_timeout_in_ms=2000]

INFO [HANDSHAKE-/10.0.1.142] 2015-05-23 05:43:42,591 OutboundTcpConnection.java:494 - Cannot handshake version with /10.0.1.142

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,713 MessagingService.java:887 - 135 MUTATION messages dropped in last 5000ms

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,713 StatusLogger.java:51 - Pool Name Active Pending Completed Blocked All Time Blocked

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,714 StatusLogger.java:66 - CounterMutationStage 0 0 0 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,714 StatusLogger.java:66 - ReadStage 5 1 5702809 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,715 StatusLogger.java:66 - RequestResponseStage 0 45 29528010 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,715 StatusLogger.java:66 - ReadRepairStage 0 0 997 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,715 StatusLogger.java:66 - MutationStage 0 31 43404309 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,716 StatusLogger.java:66 - GossipStage 0 0 569931 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,716 StatusLogger.java:66 - AntiEntropyStage 0 0 0 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,716 StatusLogger.java:66 - CacheCleanupExecutor 0 0 0 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,717 StatusLogger.java:66 - MigrationStage 0 0 9 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,829 StatusLogger.java:66 - ValidationExecutor 0 0 0 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,830 StatusLogger.java:66 - Sampler 0 0 0 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,830 StatusLogger.java:66 - MiscStage 0 0 0 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,831 StatusLogger.java:66 - CommitLogArchiver 0 0 0 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,831 StatusLogger.java:66 - MemtableFlushWriter 1 1 1756 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,831 StatusLogger.java:66 - PendingRangeCalculator 0 0 11 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,832 StatusLogger.java:66 - MemtableReclaimMemory 0 0 1756 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,832 StatusLogger.java:66 - MemtablePostFlush 1 2 3819 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,832 StatusLogger.java:66 - CompactionExecutor 2 32 742 0 0

INFO [ScheduledTasks:1] 2015-05-23 05:43:42,833 StatusLogger.java:66 - InternalResponseStage 0 0 0 0 0

INFO [HANDSHAKE-/10.0.1.142] 2015-05-23 05:43:45,086 OutboundTcpConnection.java:485 - Handshaking version with /10.0.1.142

UPDATE:

The issue persists. I thought after one compaction on each node finishes the node is back to normal, but it isn't. After a few hours, CPU jumps to 100% and load average in the areas of 100-300.

I am downgrading back to 2.1.4.

UPDATE:

Used phact's dumpThreads script to get stack traces. Also, tried using jvmtop, but it just seemed to hang.

The output is too big to paste here, but you can find it at http://downloads.gryphonet.com/cassandra/.

Username: cassandra Password: cassandra

Amir Tuval
  • 317
  • 1
  • 7
  • are you using DTCS? what compaction strategy? – phact May 22 '15 at 20:16
  • Can you paste your data model? – phact May 22 '15 at 20:42
  • What is in the cassandra logs? – psanford May 22 '15 at 22:38
  • Can you pull a thread dump? Here's how -- `sudo su - cassandra -s /bin/bash` `wget https://gist.githubusercontent.com/phact/22aa5085e58a43de7425/raw/de435ea127f1acde1054ad3c6af4995856e1a1be/threadDump.sh` `wget https://gist.githubusercontent.com/phact/6c2699e884950acd2ed0/raw/688827d8f1c0fc882985394f2e1ee5be21bef804/topthreads.py` `wget https://gist.githubusercontent.com/phact/22aa5085e58a43de7425/raw/de435ea127f1acde1054ad3c6af4995856e1a1be/threadDump.sh` `export mypid='cat /run/dse/dse.pid'` `sudo chmod 777 threadDump.sh` `./threadDump.sh $mypid 10 10` Paste jstack.out and top.out – phact May 24 '15 at 19:14
  • Or drop them on pastebin or something – phact May 24 '15 at 19:15
  • I have re-upgraded one of the nodes to 2.1.5. Waiting for the issue to reproduce. Will update as soon as I have something – Amir Tuval May 25 '15 at 19:14

2 Answers2

1

try using jvmtop to see what is the cassandra process doing. it has two modes, one to see current running threads and the other to show distribution of cpu per class procedure (--profile), paste both outputs here

gilg
  • 73
  • 7
  • I have re-upgraded one of the nodes to 2.1.5. Waiting for the issue to reproduce. Will update as soon as I have somethin – Amir Tuval May 25 '15 at 19:15
1

Answering my own question -

We are using a very specific thrift API - describe_splits_ex, and that seems to cause the issue. It is obvious when looking at all the stack traces of all the different threads while the cpu usage goes to 100%. For us, it was easy to fix, because we use this api as an optimization, not a must, so we just stopped using it, and the issue went away.

However, this api is also used by the cassandra-hadoop connector (at least it used to in earlier versions), so I would test before upgrading to 2.1.5 if you are using the connector.

Not sure what change in 2.1.5 caused the issue, but I know it did not happen in 2.1.4 and happened consistently in 2.1.5.

Amir Tuval
  • 317
  • 1
  • 7