1

My big ol' master node hardware is doing practically nothing during my Hadoop/Spark runs because YARN uses a random slave node for its AM on each task. I like the old Hadoop 1 way better; lots of log chasing and ssh pain was avoided that way when things went wrong.

Is it possible?

Judge Mental
  • 5,209
  • 17
  • 22
  • technically YARN api has methods for these manipulations, like this one https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.html#setAMContainerResourceRequest(org.apache.hadoop.yarn.api.records.ResourceRequest) – AdamSkywalker Jan 27 '17 at 23:30
  • but I've never seen any simple hadoop example to understand how to use it – AdamSkywalker Jan 27 '17 at 23:32

1 Answers1

1

It's possible with Spark and YARN node labels.

  1. Labelize your nodes
  2. Use spark.yarn.am.nodeLabelExpression properties

Good to read:

Thomas Decaux
  • 21,738
  • 2
  • 113
  • 124