0

What is your experience with RAID1 on HDP cluster?

I have in my mind two options:

  1. Setup RAID 1 for master and zoo nodes, and don't use RAID at all on slave nodes like kafka brokers, hbase regionservers and yarn nodemanager's.

Even if I loose one slave node, I will have two other replicas. In my opinion, RAID will only slow down my cluster.

  1. Despite everything, setup everything using RAID 1.

What do you think about it? What is you experience with HDP and RAID? What do you think about using RAID 0 for slave nodes?

Community
  • 1
  • 1

1 Answers1

1

I'd recommend no RAID at all on Hadoop hosts. There is one caveat, in that if you are running services like Oozie and the Hive metastore that use a relational DB behind the scenes, raid may well make sense on the DB host.

On a master node, assuming you have Namenode, zookeeper etc - generally the redundancy is built into the service. For namenodes, all the data is stored on both namenodes. For Zookeeper, if you lose one node, then the other two nodes have all the information.

Zookeeper likes fast disks - ideally dedicate a full disk to zookeeper. If you have namenode HA, give the namenode edits directory and each journal node a dedicated disk too.

For the slave nodes, the datanode will write across all disks, effectively striping the data anyway. Each 'write' is at most the HDFS block size, so if you were writing a large file, you could get 128MB on disk 1, then the next 128MB on disk 2 etc.

Stephen ODonnell
  • 4,441
  • 17
  • 19