0

I would like to track and calculate the READ and WRITE performance in Apache Cassandra (Client - cqlsh). I know TRACING ON is available, but I didn't find that much useful.

I expect to track the following examples in Cassandra (Examples),

I have 3 node Cassandra cluster and I have a table with 1 million entry, I would like to calculate the performance of READ / WRITE in the following way,

1) WRITE - 1 INSERT with 1 million entry available in it.
2) WRITE - 1 UPSERT on one of the entry with 1 million entry available in it.
3) READ - 1 READ from 1 million entry 
4) READ - ALL THE 1 million entry

which involves single partition and multiple partition too. 

Any help for tracking performance is appreciable.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Harry
  • 3,072
  • 6
  • 43
  • 100

1 Answers1

1

Statistics about performance for particular keyspace/table could be obtained via nodetool tablehistograms command (as described in documentation).

For generation of load against tables you can use cassandra-stress tool that comes together with Cassandra. It quite powerful, but requires writing of correct configuration file that mimics your tables. This blog post is quite helpful together with official documentation.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • what does the next two comment mean? Doc doesn't have much information neither this link https://stackoverflow.com/questions/34688069/how-to-read-the-cassandra-nodetool-histograms-percentile-and-other-columns – Harry Dec 13 '17 at 09:39
  • table historgram headings : Percentile SSTables Write Latency Read Latency Partition Size Cell Count – Harry Dec 13 '17 at 09:39
  • Its value : 99% 1.00 263.21 263.21 2299 310 – Harry Dec 13 '17 at 09:39
  • 1
    The first value - this is percentile - this means that 99% of transactions have this number. 2nd - number of SSTable files, 3rd - max write latency for this percentile is 264 microseconds, 4th - read latency (strange that you have both of them with same value), 5th - size of single partition in bytes, and last one - number of cells (individual values) inside this partition - because you have time as clustering key, partition includes multiple "rows" – Alex Ott Dec 13 '17 at 10:17
  • Does this mean, 3rd - max write latency for this percentile is 264 microseconds - 99% of my request took max of 264 micro seconds? same for 4th also? If possible could you please clarify the 3rd and 4th points more clearly – Harry Dec 13 '17 at 10:20
  • Also I have 800 entries in that table, How come cell count is 310? – Harry Dec 13 '17 at 10:21
  • To give you info, SSTable Dump : "cells" : [ { "name" : "flashmode", "value" : "yes" }, { "name" : "physicalusage", "value" : 38 }, { "name" : "readbw", "value" : 29 }, { "name" : "readiops", "value" : 12 }, { "name" : "totalcapacity", "value" : 20 }, { "name" : "writebw", "value" : 28 }, { "name" : "writeiops", "value" : 81 }, { "name" : "writelatency", "value" : 4 } ] – Harry Dec 13 '17 at 10:27
  • 1
    310 - number of cells in individual partition(s) - some partitions may have more data, some a less. Regarding latency - yes, that's correct - the max time for 99% of your read or write requests are ~264 microseconds. But I believe that this data isn't very representative of the real performance – Alex Ott Dec 13 '17 at 10:39
  • Another piece of cake for your brother : https://stackoverflow.com/questions/47793714/how-partition-read-is-choosen-in-cassandra – Harry Dec 13 '17 at 13:04
  • I am so curious, How the data inside a single partition is distributed, So Could you guide me in this please https://stackoverflow.com/questions/47793714/how-partition-read-is-choosen-in-cassandra – Harry Dec 13 '17 at 13:38