1

If I have 32 phsical servers which have 32 cores CPU and 128G memory inside, I want to build a VoltDB cluster with all of those 32 servers with K-Safefy=2 and 32 partitions in each server, so we will get VoltDB cluster with 256 available partitions to save data.

Looks there are too many partitions to split tables especially when some tables don't have a lot of records. But there will be too many copies of table if we choice replica of table.

If we build a much smaller cluster with a couple of servers from the beginning, there's a worry that the cluster will have to scale-out soon along with the business grows. Actually I don't konw how the VoltDB will re-organize data when a cluster expand to more nodes horizontally.

Do you have comments? Appreciated.

Simon Gao
  • 11
  • 2

1 Answers1

0

It may be more optimal to set the sitesperhost to less than 32, so that some % of cores are free to run threads for subsystems like export or database replication, or to handle non-VoltDB processes. Typically somewhere from 8 - 24 is the optimal number.

VoltDB creates the logical partitions based on the sitesperhost, the number of hosts, and the kfactor. If you need to scale out later, you can add additional nodes to the cluster which will increase the number of partitions, and VoltDB will gradually and automatically rebalance data from existing partitions to the new ones. You must add multiple servers together if you have kfactor > 0. For kfactor=2, you would add servers in sets of 3 so that they provide their own redundancy for the new partitions.

Your data is distributed across the logical partitions based on a hash of the partition key value of a record, or the corresponding input parameter for routing the execution of a procedure to a partition. In this way, the client application code does not need to be aware of the # of partitions. It doesn't matter so much which partition each record goes to, but you can assume that any records that share the same partition key value will be located in the same partition.

If you choose partition keys well, they should be columns with high cardinality, such as ID columns. This will evenly distribute the data and procedure execution work across the partitions.

Typically a VoltDB cluster is sized based on the RAM requirements, rather than the need for performance, since the performance is very high on even a very small cluster.

You can contact VoltDB at info@voltdb.com or ask more questions at http://chat.voltdb.com if you'd like to get help with an evaluation or discuss cluster sizing and planning with an expert.

Disclaimer: I work for VoltDB.

elixenide
  • 44,308
  • 16
  • 74
  • 100
BenjaminBallard
  • 1,482
  • 12
  • 11
  • For what to choose partition keys as best as possible to make partitions balanced, which kinds of column should be picked up as keys? Which aspects should be thought about? What algorithms of Hash for partitioning would you have chosen? And, how much should the threshold of memory be in maximum percentage ocuppied by VoltDB to avoid impacting stability for running of OS. – Simon Gao Jan 31 '18 at 03:05
  • And one more thing, why is the VoltDB CP(Consistency, Partition-tolerance). What situations will the VoltDB lose availability in? – Simon Gao Jan 31 '18 at 03:15
  • Could you please send me a list of features which are involved in enterprise edition other than commuity edition? – Simon Gao Jan 31 '18 at 03:25