2

Let say I want to achieve maximum useable capacity with data resilience on this 3 OSD nodes setup where each node contains 2x 1TB OSDs.

Is it safe run 3 Ceph nodes with 2-way replication?

What are the pros and cons of using 2-way? Will it cause data split-brain?

Last but not least, what domain fault tolerance will it be running on 2-way replication?

Thanks!

chrone
  • 267
  • 4
  • 11

1 Answers1

3

Sometimes, even three replica is not enough, e.g. if ssd disks (from cache) fail together or one by one.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005672.html

For two osd you can even set manually 1 replica for minimum and 2 replicas for maximum (I didn't managed to set it automatically in the case of one failed osd of all three osds):

osd pool default size = 2 # Write an object 2 times

osd pool default min size = 1 # Allow writing 1 copy in a degraded state

But this command: ceph osd pool set mypoolname set min_size 1 sets it for a pool, not just the default settings.

For n = 4 nodes each with 1 osd and 1 mon and settings of replica min_size 1 and size 4 three osd can fail, only one mon can fail (the monitor quorum means more than half will survive). 4 + 1 number of monitors is required for two failed monitors (at least one should be external without osd). For 8 monitors (four external monitors) three mon can fail, so even three nodes each with 1 osd and 1 mon can fail. I am not sure that setting of 8 monitors is possible.

Thus, for three nodes each with one monitor and osd the only reasonable settings are replica min_size 2 and size 3 or 2. Only one node can fail. If you have an external monitors, if you set min_size to 1 (this is very dangerous) and size to 2 or 1 the 2 nodes can be down. But with one replica (no copy, only original data) you can loose your job very soon.

42n4
  • 1,292
  • 22
  • 26
  • Thanks for the answer. I think I should stick with default replicas 3 then. I goal was to go with maximum useable storage capacity with 3 OSD nodes, still having those redundancy should one fails, but would like to avoid data split-brain. :D – chrone Dec 14 '16 at 03:41
  • 1
    Monitor quorum is the main problem and of course 3 replica (min_size 2 and size 3) with a possibility of one failed node is the best option. – 42n4 Dec 17 '16 at 00:50