Cassandra 1.2: how to get the real load on each virtual node

Question

I have a Cassandra 1.2 cluster and I'm using virtual nodes and the ByteOrderedPartitioner. I know this is not recommended because I need to make sure the keys of the data is evenly distributed across the keyspace so the load on each physical node is properly distributed. The problem I'm having is that I can't find a way to see the actual load on each virtual node. If I use nodetool like this:

nodetool status

I receive an output like this one:

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns   Host ID                               Rack
UN  XXX.XXX.XXX.XXX 14.73 GB   256     11.3%  a4d365ca-f21b-4418-ab0e-656520d931b5  rack1
UN  XXX.XXX.XXX.XXX  8.51 GB   256     10.6%  f587fe0b-e765-4c02-bd50-cef9758e9a6b  rack1
UN  XXX.XXX.XXX.XXX 10.92 GB   256     10.3%  6160ca91-1e07-47ec-8fa9-ef886c140e91  rack1
UN  XXX.XXX.XXX.XXX  9.62 GB   256     10.0%  9c4a8476-1de2-455b-956a-c4cea31675bf  rack1
UN  XXX.XXX.XXX.XXX 11.11 GB   256     11.2%  61639d9c-ad49-4f38-86b3-cd48e0c90c49  rack1
UN  XXX.XXX.XXX.XXX  7.86 GB   256     35.1%  195b6f79-7d68-4a98-8a9b-55bd0dd699e2  rack1
UN  XXX.XXX.XXX.XXX 11.29 GB   256     11.4%  0ac03b6a-0a0e-4f83-8b9e-2f16d4db47ab  rack1

Which means the distribution is not that good, but I want to see the actual distribution on the virtual nodes, the problem I'm having is that running:

nodetool ring

Gives me a lot of entries, one per each virtual node (256 in total) in the node I run the command but the information is pretty much useless because the load looks the same for each virtual node (and the actual size is unreal compared to the total information on the physical node)

XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[2daad5a3e325e152d7be5bc2d5f87fef])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[2ffef9060e59c1c922a1ecf8e2643794])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[31041cc591d63d91a67a21ecf44a57c2])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[31bbcaafcdcb2ecc3a4ef3fb3af4b82b])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[324e972b43b63d63df4255e459fed524])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[3353224ae20e902e5b2b243c8fc5ff97])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[350ed29fa9a1a377b8014beef1d160f0])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[3553ad83beaf91d98a692e22718e321d])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[35893a82c84982c467251115a7406f00])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[37fad1c7dbd8d66d75747699ce4d6d2e])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[388bcf470bd5c97e1f3cb45c01bd1f2c])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[38a0cdc654a9934e5a16e5242c26fc5f])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[393b8185b527f036cd44f5f6791484b9])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[39ae4356a22bbb5ea20d5c6fc83cd2de])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[39dd01bb66beeeb46627f0303671c30d])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[3a49f707a7cea045935524900094c4e4])
XXX.XXX.XXX.XXX  rack1       Up     Normal  11.29 GB        11.45%              Token(bytes[3a58eba6a5730a75fd899cf77c93d6cb])

My question is, is there another tool/way of getting the real load of each virtual node in a Cassandra cluster?

Thanks in advance!

It sounds very much like you want to use the RandomPartitioner. This gives you even distribution of keys. Did you choose the tokens manually? Choosing random tokens, which is the default for virtual nodes, won't in general give a good distribution for ByteOrderedPartitioner. — Richard, Jun 08 '13 at 09:02

score 2 · Answer 1 · answered May 23 '14 at 13:53

When you run nodetool ring without a keyspace, it examines the load based on the SimpleStrategy for replication. If you have your tokens properly distributed for NetworkTopologyStrategy, this will look "off".

Since replication strategy determines load, and each keyspace can have a different replication strategy, you need to pass in the keyspace name as the second arg to see the true load distribution per keyspace.

If you are using the NetworkTopologyStrategy, nodetool ring <keyspace> will take into account datacenter and rack location to determine your token distribution, and give you an accurate load value.

score 0 · Answer 2 · answered Sep 17 '13 at 14:57

Did you try with Cassandra OpsCenter? http://www.datastax.com/what-we-offer/products-services/datastax-opscenter

I'm not sure (never tried) if you can specifically get the real load of each virtual node, but it's a great tool to monitor and manage your database

Cassandra 1.2: how to get the real load on each virtual node

2 Answers2