a) Adding Nodes in an AZ aware manner
The new nodes should be prepared similarly to other nodes (as described in the link you posted) in terms of ulimit
settings, data drive preparation, YugaByte DB software installation etc.
During cluster expansion, given that there's already enough nodes running the yb-master
processes, the new nodes only need to run the yb-tserver
process. The steps related to yb-master
processes can be omitted when adding nodes to the cluster. [To read more about the role of the yb-master
and yb-tserver
process, see https://docs.yugabyte.com/latest/architecture/concepts/universe/.]
When preparing the yb-tserver
config file for the newly added nodes, be sure to set their placement info flags appropriately (the cloud/region/zone) which tells the system and its load-balancer the needed information about where each of the nodes is:
For example, for the first new node these flags might be something like:
--placement_cloud=aws
--placement_region=us-west
--placement_zone=us-west-2a
and for the other two nodes the --placement_zone
might be say us-west-2b
and us-west-2c
.
You would have done something similar when setting up yb-tserver
on the initial 6 nodes spread across the three AZs.
Starting those yb-tserver
's would be no different from the initial servers. For example:
~/tserver/bin/yb-tserver --flagfile ~/yb-conf/tserver.conf >& /mnt/d0/yb-tserver.out &
Note: The value for the master addresses gflag tserver_master_addrs
in tserver.conf
should be same as that of the existing yb-tservers
. That's what ensures these nodes will seamlessly join the existing cluster.
b) The nodes can be added/started all at once. There is no need to add them one at time with some wait interval in between. The latter may actually cause data to be rebalanced/moved more times than necessary. When the system knows that it needs to go from a 6-node state to 9-node state all at once, it can more optimally get to the desired end state of a balanced cluster by doing only the required amount of data movement.
c) No additional steps are needed to trigger load-balancing! The system automatically rebalances the tablets (shards) in a rate limited manner to keep impact on the foreground application minimal. Currently, this per-node rate limit is controlled by the gflag remote_boostrap_rate_limit_bytes_per_sec
and its default is 100 MB/sec. But depending on workload and bandwidth available this can be adjusted to a more aggressive or conservative setting. Note that this rebalancing is a background & online operation in YugaByte DB, and is done by copying compressed files from corresponding tablet leaders. Therefore, it is significantly lighter weight than eventually consistent databases (like Apache Cassandra or MongoDB) which have to do a logical (uncompressed) read of data from all peers.