1

I have been trying to auto-scale a 3node Cassandra cluster with Replication Factor 3 and Consistency Level 1 on Amazon EC2 instances. Despite the load balancer one of the autoscaled nodes has zero CPU utilization and the other autoscaled node has considerable traffic on it.

I have experimented more than 4 times to auto-scale a 3 node with RF3CL1 and the CPU utilization on one of the autoscaling nodes is still zero. The overall CPU utilization has a drop but one of the autoscaled nodes is consistently idle from the point of auto scaling.

Note that the two nodes which are launched at the point of autoscaling are started by the same launch configuration. The two nodes have the same configuration in every aspect. There is an alarm for the triggering of the nodes and the scaling policy is set as per that alarm.

Can there be a bash script that can be run on the user data?

For example, altering the keyspaces?

Can someone let me know what could be the reason behind this behavior?

rjdkolb
  • 10,377
  • 11
  • 69
  • 89
siddhartha s
  • 13
  • 1
  • 3

2 Answers2

1

AWS auto scaling and load balancing is not a good fit for Cassandra. Cassandra has its own built in clustering with seed nodes to discover the other members of the cluster, so there is no need for an ELB. And auto scaling can screw you up because the data has to be re-balanced between the nodes.

https://d0.awsstatic.com/whitepapers/Cassandra_on_AWS.pdf

myron-semack
  • 6,259
  • 1
  • 26
  • 38
  • The issue is still persistent for a RF3 and CL QUORUM configuration. My yaml configuration is: "seeds" is the seed address, listen address and rpc address is the IP of the respective node. Is there any problem with my yaml configuration? Further, the three nodes which are being autoscaled are in the same AZ. – siddhartha s Aug 25 '17 at 23:05
0

yes, you don't need ELB for Cassandra.

So you created a single node Cassandra, and created some keyspace. Then you scaled Cassandra to three nodes. You found one new node was idle when accessing the existing keyspace. Is this understanding correct? Did you alter the existing keyspace's replication factor to 3? If not, the existing keyspace's data will still have 1 replica.

When adding the new nodes, Cassandra will automatically balance some tokens to the new nodes. This is probably why you are seeing load on one of the new nodes, which happens to get some tokens that has keyspace data.

CloudStax
  • 649
  • 5
  • 6
  • Yes, your understanding is right. My scaling policy is to add 2 instances when the alarm is breached. Technically, I am letting AWS add two Cassandra instances at the very same time. I am thinking that two Cassandra instances cannot join the existing cluster concurrently unless it is specified otherwise in the yaml configuration. I cannot find the auto-bootstrap option on the yaml configuration. – siddhartha s Aug 31 '17 at 08:36
  • It would be fine to have two instances join the existing cluster concurrently. Cassandra will automatically bootstrap the new nodes. No option is required. – CloudStax Aug 31 '17 at 14:50