High disk I/O (read) on Cassandra nodes

Question

We have 3 nodes Cassandra cluster.

We have an application that uses a keyspace that creates a hightload on disks, on read. The problem has a cumulative effect. The more days we interact with the keyspace, the more disk reading grows. : hightload read

Reading goes up to > 700 MB/s. Then the storage (SAN) begins to degrade, and then the Сassandra cluster also degrades.

UPD 25.10.2021: "I wrote it a little wrong, through the SAN space is allocated to a virtual machine, like a normal drive"

The only thing that helps is clearing the keyspace.

Output command "tpstats" and "cfstats"

[cassandra-01 ~]$ nodetool tpstats
Pool Name                         Active   Pending      Completed   Blocked  All time blocked
ReadStage                              1         1     1837888055         0                 0
MiscStage                              0         0              0         0                 0
CompactionExecutor                     0         0        6789640         0                 0
MutationStage                          0         0      870873552         0                 0
MemtableReclaimMemory                  0         0           7402         0                 0
PendingRangeCalculator                 0         0              9         0                 0
GossipStage                            0         0       18939072         0                 0
SecondaryIndexManagement               0         0              0         0                 0
HintsDispatcher                        0         0              3         0                 0
RequestResponseStage                   0         0     1307861786         0                 0
Native-Transport-Requests              0         0     2981687196         0                 0
ReadRepairStage                        0         0         346448         0                 0
CounterMutationStage                   0         0              0         0                 0
MigrationStage                         0         0            168         0                 0
MemtablePostFlush                      0         0           8193         0                 0
PerDiskMemtableFlushWriter_0           0         0           7402         0                 0
ValidationExecutor                     0         0             21         0                 0
Sampler                                0         0          10988         0                 0
MemtableFlushWriter                    0         0           7402         0                 0
InternalResponseStage                  0         0           3404         0                 0
ViewMutationStage                      0         0              0         0                 0
AntiEntropyStage                       0         0             71         0                 0
CacheCleanupExecutor                   0         0              0         0                 0

Message type           Dropped
READ                         7
RANGE_SLICE                  0
_TRACE                       0
HINT                         0
MUTATION                     5
COUNTER_MUTATION             0
BATCH_STORE                  0
BATCH_REMOVE                 0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

[cassandra-01 ~]$ nodetool cfstats box_messages -H
Total number of tables: 73
----------------
Keyspace : box_messages
    Read Count: 48847567
    Read Latency: 0.055540737801741485 ms
    Write Count: 69461300
    Write Latency: 0.010656743870327794 ms
    Pending Flushes: 0
        Table: messages
        SSTable count: 6
        Space used (live): 3.84 GiB
        Space used (total): 3.84 GiB
        Space used by snapshots (total): 0 bytes
        Off heap memory used (total): 10.3 MiB
        SSTable Compression Ratio: 0.23265712113582082
        Number of partitions (estimate): 4156030
        Memtable cell count: 929912
        Memtable data size: 245.04 MiB
        Memtable off heap memory used: 0 bytes
        Memtable switch count: 92
        Local read count: 20511450
        Local read latency: 0.106 ms
        Local write count: 52111294
        Local write latency: 0.013 ms
        Pending flushes: 0
        Percent repaired: 0.0
        Bloom filter false positives: 57318
        Bloom filter false ratio: 0.00841
        Bloom filter space used: 6.56 MiB
        Bloom filter off heap memory used: 6.56 MiB
        Index summary off heap memory used: 1.78 MiB
        Compression metadata off heap memory used: 1.95 MiB
        Compacted partition minimum bytes: 73
        Compacted partition maximum bytes: 17084
        Compacted partition mean bytes: 3287
        Average live cells per slice (last five minutes): 2.0796939751354797
        Maximum live cells per slice (last five minutes): 10
        Average tombstones per slice (last five minutes): 1.1939751354797576
        Maximum tombstones per slice (last five minutes): 2
        Dropped Mutations: 5 bytes

Cassandra has a mechanism called compaction (`nodetool compactiostats`) which will read your data that was flushed to disk and compact (the default) 4 tables of similar size to a new one to get rid of different row versions and handle the amount of files on filesystem). This is heavily IO-bound - and will affect all nodes, likely at the same time and hitting your SAN. — Mandraenke, Oct 18 '21 at 09:28
Important to note, but if the same SAN device is hosting the disk for all 3 nodes, it's also acting as a single point of failure. — Aaron, Oct 18 '21 at 20:39
I wrote it a little wrong, through the SAN space is allocated to a virtual machine, like a normal drive — Nick, Oct 25 '21 at 08:20

score 1 · Answer 1 · answered Oct 18 '21 at 12:51

1

(I'm unable to comment and hence posting it as an answer)

As folks mentioned SAN is not going to be the best suite here and one could read through the list of anti-patterns documented here which could also apply to OSS C*.

answered Oct 18 '21 at 12:51

Madhavan

758
4
8

High disk I/O (read) on Cassandra nodes

1 Answers1