0

Let's imagine we have a server with 10 disk storage and i gonna create a yugabyte cluster with replication factor of 5 (and 20 tablet sharding). I have two option to do that:

  1. Create 5 node and give each tserver 2 disk to use with 100 tablet per node?(which i'm not sure can a tserver use 2 disk to balance tablets between them).

  2. Create 10 node and give each tserver 1 disk to use with 50 tablet per node.

which one recommended if there is any difference?

Ali Zeinali
  • 551
  • 4
  • 16
  • In your question, am I understanding correctly that by server you mean a bare metal server, and when you say node do you mean a VM running on that server? If so, why do you want to replicate the date 5 times on the same physical server? If the server has a power loss, you'll still have an outage. Wouldn't it be better to replicate the data across VMs on different servers (ideally across different zones/failure domains)? – Kannan Muthukkaruppan Jan 25 '20 at 20:54
  • 1
    Yes you are right i meant a bare metal server. Thanks for suggestion. We would consider it. But at the moment it is our only option. And Actually my point of this question was to find out how Yugabyte work with multiple disk, which i got the point – Ali Zeinali Jan 26 '20 at 07:32

1 Answers1

3

Either of those options would work. A yb-tserver can utilize multiple disks, it will spray the data for the tablets it hosts across multiple disks and utilize them.

That said, there are other factors matter, outlining some of them below:

  • The most important consideration is the size of the machines (number of vCPU's / CPU cores) in the two cases. We recommend nodes which are at least 8 or 16 cores to achieve optimal performance. In your setup, assuming that in option #2 you would use smaller machines with half the number of vCPUs (so that the aggregate vCPUs across cluster remains the same in both cases) - please pick which ever setup gets you to at least 8 cores. If both cases are less than 8 vCPUs, then option #1 is better since it has more cores.

  • Assuming both options satisfy the above point, a second consideration is the impact of a failure. If you have more nodes, the impact of a failure is not as high on the cluster compared to fewer nodes - so from this perspective, option #1 is better. Of course, the reality is a bit more nuanced - factors such as is it a multi-zone setup vs single zone, etc will affect this decision.

Hope that helps.