How to manage disks for Hadoop and Kubernetes cluster on private cloud

Question

I have a Hadoop cluster running on a local cloud, and each data nodes has 8 disks and all disks are allocated to Hadoop. I also want to setup a Kubernetes cluster on these nodes and use local storage. For this purpose, I decided to use a directory in one of disks per data nodes for Kubernetes persistent claims. I'm aware that might create a disk contention but probably I will handle it in the future.

My question is that, if I want to use rook to handle the storage on Kubernetes since that directory has already a file system (which here is HDFS), is it feasible? I mean does rook accept this directory on each data nodes?

Thanks,

Rook/Ceph is a Hadoop compatible filesystem. Therefore, you wouldn't need HDFS — OneCricketeer, Nov 07 '18 at 15:50
Well, AFAIK, Rook doesn't communicate with HDFS and wouldn't be able to store data using the HDFS API. And you wouldn't want to put Rook data directories inside of the datanode volume mounts, so I'm just not sure what good HDFS is providing you in this setup. Plus, YARN can basically run and manage Docker containers in a similar fashion to Kubernetes (without `kubectl`), so do you really need to combine the two? — OneCricketeer, Nov 08 '18 at 20:00
I'm kind of new in this stuff, but we already have data in hadoop, and wanted to use kubernetes in order to manage our applications. But now your comment has made me curious if YARN does what kubernetes do, why there is so much approaching towards kubernetes? By the way, I decided to deallocate one disk per data node from hdfs for kubernetes. I think this way would make more sense. — Fatemeh Rouzbeh, Nov 10 '18 at 18:31
Most Hadoop vendors have not acknowledged the Kubernetes movement (yet) and are still using older versions of YARN that do not support Docker containers. Plus, most companies don't trust Docker for security reasons. Combining that fact with managing Hadoop cluster access with distributing credentials securely into containers, just makes the problem harder. Sure, Kubernetes gains a large share in non Hadoop environments, but Mesos can share Zookeeper with Hadoop, so that's used sometimes as well - particularly before Spark supported Kubernetes — OneCricketeer, Nov 10 '18 at 19:41
Thats fine and all, but I wouldn't put them on the same machines, unless you use [some MapR Hadoop cluster](https://mapr.com/blog/containers-kubernetes-and-mapr-the-time-is-now/). Otherwise, all Hadoop services using YARN are taking avilable memory away from Kubernetes... It's not just disk you have to worry about — OneCricketeer, Nov 12 '18 at 23:29
I'm pretty new in this area and I know if I say this to my manager he will say that for now this is enough for us :D But thanks for mentioning MapR, I will take a look at it. But is it kind of storage management? like Rook? — Fatemeh Rouzbeh, Nov 15 '18 at 16:05

How to manage disks for Hadoop and Kubernetes cluster on private cloud

0 Answers0