Everyone
A few days ago I upgraded our 6 node EC2 cluster from cassandra 2.1.4 to 2.1.5.
Since then, all my nodes have "exploded" in their cpu usage - their at 100% cpu for much of the time and their load average is between 100-300 (!!!).
This did not start immediately after the upgrade. It started a few hours afterwards with one of the nodes, and slowly, more and more nodes began to exhibit the same behavior. It seems to correlate with the compaction of our largest column family, and after the compaction is complete (~24 hours after it starts) it seems the node goes back to normal. It has only been 2 days or so, so I'm hoping it will not happen again, but I am still monitoring this.
Here are my questions
- Is this a bug or an expected behavior?
If it is an expected behavior -
- What is the explanation for this issue?
- Is it documented somewhere that I have missed?
- Should I do upgrades differently? Maybe 1 or 2 nodes at a time every 24 hours or so? What is the best practice?
If it is a bug -
- Is it known?
- Where should I report this? What data should I add?
- Will downgrading back to 2.1.4 work?
Any feedback on this would be great
Thanks
Amir
Update:
This is the structure of the table in question.
CREATE TABLE tbl1 (
key text PRIMARY KEY,
created_at timestamp,
customer_id bigint,
device_id bigint,
event text,
fail_count bigint,
generation bigint,
gr_id text,
imei text,
raw_post text,
"timestamp" timestamp
) WITH COMPACT STORAGE
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
The logs do not reveal much (at least to me). Here's a snippet of how the logs look
INFO [WRITE-/10.0.1.142] 2015-05-23 05:43:42,577 YamlConfigurationLoader.java:92 - Loading settings from file:/etc/cassandra/cassandra.yaml
INFO [WRITE-/10.0.1.142] 2015-05-23 05:43:42,580 YamlConfigurationLoader.java:135 - Node configuration:[authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_snapshot=true; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; broadcast_rpc_address=10.0.2.145; cas_contention_timeout_in_ms=1000; client_encryption_options=; cluster_name=Gryphonet21 Cluster; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_directory=/data/cassandra/commitlog; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_period_in_ms=10000; compaction_throughput_mb_per_sec=16; concurrent_counter_writes=32; concurrent_reads=32; concurrent_writes=32; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; data_file_directories=[/data/cassandra/data]; disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; endpoint_snitch=GossipingPropertyFileSnitch; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; incremental_backups=false; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; inter_dc_tcp_nodelay=false; internode_compression=all; key_cache_save_period=14400; key_cache_size_in_mb=null; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; memtable_allocation_type=heap_buffers; native_transport_port=9042; num_tokens=16; partitioner=RandomPartitioner; permissions_validity_in_ms=2000; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_timeout_in_ms=10000; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=0.0.0.0; rpc_keepalive=true; rpc_port=9160; rpc_server_type=sync; saved_caches_directory=/data/cassandra/saved_caches; seed_provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider, parameters=[{seeds=10.0.1.141,10.0.2.145,10.0.3.149}]}]; server_encryption_options=; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=true; storage_port=7000; thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; write_request_timeout_in_ms=2000]
INFO [HANDSHAKE-/10.0.1.142] 2015-05-23 05:43:42,591 OutboundTcpConnection.java:494 - Cannot handshake version with /10.0.1.142
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,713 MessagingService.java:887 - 135 MUTATION messages dropped in last 5000ms
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,713 StatusLogger.java:51 - Pool Name Active Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,714 StatusLogger.java:66 - CounterMutationStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,714 StatusLogger.java:66 - ReadStage 5 1 5702809 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,715 StatusLogger.java:66 - RequestResponseStage 0 45 29528010 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,715 StatusLogger.java:66 - ReadRepairStage 0 0 997 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,715 StatusLogger.java:66 - MutationStage 0 31 43404309 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,716 StatusLogger.java:66 - GossipStage 0 0 569931 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,716 StatusLogger.java:66 - AntiEntropyStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,716 StatusLogger.java:66 - CacheCleanupExecutor 0 0 0 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,717 StatusLogger.java:66 - MigrationStage 0 0 9 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,829 StatusLogger.java:66 - ValidationExecutor 0 0 0 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,830 StatusLogger.java:66 - Sampler 0 0 0 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,830 StatusLogger.java:66 - MiscStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,831 StatusLogger.java:66 - CommitLogArchiver 0 0 0 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,831 StatusLogger.java:66 - MemtableFlushWriter 1 1 1756 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,831 StatusLogger.java:66 - PendingRangeCalculator 0 0 11 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,832 StatusLogger.java:66 - MemtableReclaimMemory 0 0 1756 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,832 StatusLogger.java:66 - MemtablePostFlush 1 2 3819 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,832 StatusLogger.java:66 - CompactionExecutor 2 32 742 0 0
INFO [ScheduledTasks:1] 2015-05-23 05:43:42,833 StatusLogger.java:66 - InternalResponseStage 0 0 0 0 0
INFO [HANDSHAKE-/10.0.1.142] 2015-05-23 05:43:45,086 OutboundTcpConnection.java:485 - Handshaking version with /10.0.1.142
UPDATE:
The issue persists. I thought after one compaction on each node finishes the node is back to normal, but it isn't. After a few hours, CPU jumps to 100% and load average in the areas of 100-300.
I am downgrading back to 2.1.4.
UPDATE:
Used phact's dumpThreads script to get stack traces. Also, tried using jvmtop, but it just seemed to hang.
The output is too big to paste here, but you can find it at http://downloads.gryphonet.com/cassandra/.
Username: cassandra Password: cassandra