Is it correct that master runs on a datanode?

Question

I'm using giraph-1.3 built with yarn profile. For starting I configured 1 namenode and 2 datanodes on a ec2 cluster. My application properly works because I see expected output in logs (and in output directory). I launched giraph with "-w 2" argument because I have two datanodes.

In userlogs of datanode1 I found log of first worker.
in userlogs of datanode2 I found log of second worker and log of master too.

I expected to find log of master in the namenode i.e. I expected that master runs on namenode. Is it right?

Maybe I have to configure another datanode and then I will find master logs on this new datanode?

What file path are you seeing logs in? Are you sure that isn't the NodeManager, not the datanode? — OneCricketeer, Aug 31 '18 at 02:07
Your datanodes don't run YARN jobs unless you've also installed a NodeManager on them — OneCricketeer, Sep 02 '18 at 22:53

score 0 · Answer 1 · answered Sep 02 '18 at 20:35

0

I understood that hadoop/giraph works creating containers on datanodes. Hadoop creates a container for application master, then giraph creates a container for the master. Furthermore giraph creates a number of container for workers corresponding to -w parameter.

answered Sep 02 '18 at 20:35

Cristina Bovi

11
4

Containers are created on NodeManagers, actually, which ideally is a service installed onto the datanodes – OneCricketeer Sep 02 '18 at 22:53

score 0 · Answer 2 · answered Sep 02 '18 at 22:57

YARN always creates an Application Master for every job.

You can start as many "workers" as you want, depending on your workload, but since you only have 2 datanodes, you can only have 2 NodeManagers for maximum parallelism

A NodeManager has a maximum memory space available to it, and the YARN containers for the tasks of a job get a subsection of that in order to do processing.

Is it correct that master runs on a datanode?

2 Answers2