Questions tagged [hadoop-yarn]

YARN (Yet Another Resource Negotiator) is a key component of second generation Apache Hadoop infrastructure. DO NOT USE THIS for the JavaScript/Node.js Yarn package manager (use [yarnpkg] instead)! Originally described by Apache as a redesigned resource manager, YARN is now characterized as a large-scale, distributed operating system for big data applications including next generation MapReduce (MR2).

In the Big Data business running fewer larger clusters is cheaper than running more small clusters. Larger clusters also process larger data sets and support more jobs and users.

The Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines. We are developing the next generation of Apache Hadoop MapReduce that factors the framework into a generic resource scheduler and a per-job, user-defined component that manages the application execution. Since downtime is more expensive at scale high-availability is built-in from the beginning; as are security and multi-tenancy to support many users on the larger clusters. The new architecture will also increase innovation, agility and hardware utilization.

Background

The current implementation of the Hadoop MapReduce framework is showing it’s age.

Given observed trends in cluster sizes and workloads, the MapReduce JobTracker needs a drastic overhaul to address several deficiencies in its scalability, memory consumption, threading-model, reliability and performance. Over the last 5 years, there has been spot fixes, however lately these have come at an ever-growing cost as evinced by the increasing difficulty of making changes to the framework. The architectural deficiencies, and corrective measures, are both old and well understood - even as far back as late 2007, when we documented the proposed fix on MapReduce’s jira: MAPREDUCE-278.

From an operational perspective, the current Hadoop MapReduce framework forces a system-wide upgrade for any minor or major changes such as bug fixes, performance improvements and features. Worse, it forces every single customer of the cluster to upgrade at the same time, regardless of his or her interests; this wastes expensive cycles of customers as they validate the new version of the Hadoop for their applications.

The Next Generation of MapReduce

Yarn Architecture

                          Figure:  Yarn Architecture

The fundamental idea of the re-architecture is to divide the two major functions of the JobTracker, resource management and job scheduling/monitoring, into separate components. The new ResourceManager manages the global assignment of compute resources to applications and the per-application ApplicationMaster manages the application’s scheduling and coordination. An application is either a single job in the classic MapReduce jobs or a DAG of such jobs. The ResourceManager and per-machine NodeManager server, which manages the user processes on that machine, form the computation fabric. The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.

The ResourceManager supports hierarchical application queues and those queues can be guaranteed a percentage of the cluster resources. It is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees on restarting failed tasks either due to application failure or hardware failures.

The ResourceManager performs its scheduling function based the resource requirements of the applications; each application has multiple resource request types that represent the resources required for containers. The resource requests include memory, CPU, disk, network etc. Note that this is a significant change from the current model of fixed-type slots in Hadoop MapReduce, which leads to significant negative impact on cluster utilization. The ResourceManager has a scheduler policy plug-in, which is responsible for partitioning the cluster resources among various queues, applications etc. Scheduler plug-ins can be based, for e.g., on the current CapacityScheduler and FairScheduler.

The NodeManager is the per-machine framework agent who is responsible for launching the applications’ containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the Scheduler.

The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, launching tasks, tracking their status & monitoring for progress, and handling task-failures.

MRV2 maintains API compatibility with previous stable release (hadoop-1.x). This means that all Map-Reduce jobs should still run unchanged on top of MRv2 with just a recompile.

3897 questions
26
votes
2 answers

Apache Hadoop Yarn vs. Kubernetes

Since versions 2.6 (Apache Hadoop) Yarn handles docker containers. Basically it distributes the requested amount of containers on a Hadoop cluster, restart failed containers and so on. Kubernetes seemed to do the same. Where are the major…
Simon_Prewo_Frankfurt
  • 1,209
  • 2
  • 11
  • 18
26
votes
5 answers

Spark runs on Yarn cluster exitCode=13:

I am a spark/yarn newbie, run into exitCode=13 when I submit a spark job on yarn cluster. When the spark job is running in local mode, everything is fine. The command I used is: /usr/hdp/current/spark-client/bin/spark-submit --class…
user_not_found
  • 471
  • 2
  • 6
  • 13
26
votes
1 answer

How to run a Kafka connect worker in YARN?

I'm playing with Kafka-Connect. I've got the HDFS connector working both in stand-alone mode and distributed mode. They advertise that the workers (which are responsible for running the connectors) can be managed via YARN However, I haven't seen…
hba
  • 7,406
  • 10
  • 63
  • 105
25
votes
3 answers

Hadoop namenode : Single point of failure

The Namenode in the Hadoop architecture is a single point of failure. How do people who have large Hadoop clusters cope with this problem?. Is there an industry-accepted solution that has worked well wherein a secondary Namenode takes over in case…
rakeshr
  • 1,027
  • 3
  • 17
  • 25
25
votes
7 answers

Spark on yarn mode end with "Exit status: -100. Diagnostics: Container released on a *lost* node"

I am trying to load a database with 1TB data to spark on AWS using the latest EMR. And the running time is so long that it doesn't finished in even 6 hours, but after running 6h30m , I get some error announcing that Container released on a lost node…
John Zeng
  • 1,174
  • 2
  • 9
  • 22
24
votes
2 answers

How to execute Spark programs with Dynamic Resource Allocation?

I am using spark-summit command for executing Spark jobs with parameters such as: spark-submit --master yarn-cluster --driver-cores 2 \ --driver-memory 2G --num-executors 10 \ --executor-cores 5 --executor-memory 2G \ --class…
Arvind Kumar
  • 1,325
  • 1
  • 19
  • 27
23
votes
2 answers

Why does Yarn on EMR not allocate all nodes to running Spark jobs?

I'm running a job on Apache Spark on Amazon Elastic Map Reduce (EMR). Currently I'm running on emr-4.1.0 which includes Amazon Hadoop 2.6.0 and Spark 1.5.0. When I start the job, YARN correctly has allocated all the worker nodes to the spark job…
retnuH
  • 1,525
  • 2
  • 11
  • 18
23
votes
3 answers

Standalone Manager Vs. Yarn Vs. Mesos

On 3 node Spark/Hadoop cluster which scheduler(Manager) will work efficiently? Currently I am using Standalone Manager, but for each spark job I have to explicitly specify all resource parameters(e.g: cores,memory etc),which I want to avoid. I have…
Abhinandan Satpute
  • 2,558
  • 6
  • 25
  • 43
22
votes
3 answers

Spark + EMR using Amazon's "maximizeResourceAllocation" setting does not use all cores/vcores

I'm running an EMR cluster (version emr-4.2.0) for Spark using the Amazon specific maximizeResourceAllocation flag as documented here. According to those docs, "this option calculates the maximum compute and memory resources available for an…
retnuH
  • 1,525
  • 2
  • 11
  • 18
22
votes
3 answers

"sparkContext was shut down" while running spark on a large dataset

When running sparkJob on a cluster past a certain data size(~2,5gb) I am getting either "Job cancelled because SparkContext was shut down" or "executor lost". When looking at yarn gui I see that job that got killed was successful. There are no…
Aleksander Zendel
  • 463
  • 1
  • 3
  • 12
22
votes
4 answers

Apache Spark: setting executor instances does not change the executors

I have an Apache Spark application running on a YARN cluster (spark has 3 nodes on this cluster) on cluster mode. When the application is running the Spark-UI shows that 2 executors (each running on a different node) and the driver are running on…
user4688877
  • 223
  • 1
  • 2
  • 6
21
votes
10 answers

yarn command not found after installing via npm

As per the yarn installation for yarn v2, they want you to install using npm install -g yarn. So I ran sudo npm install -g yarn on Ubuntu 20.04. But after I do that, it says command not found. ❯ sudo npm install -g yarn > yarn@1.22.10 preinstall…
cclloyd
  • 8,171
  • 16
  • 57
  • 104
21
votes
3 answers

How to exit spark-submit after the submission

When submitting spark streaming program using spark-submit(YARN mode) it keep polling the status and never exit Is there any option in spark-submit to exit after the submission? ===why this trouble me=== The streaming program will run forever and i…
Peter Chan
  • 255
  • 1
  • 2
  • 7
21
votes
9 answers

MapReduce job hangs, waiting for AM container to be allocated

I tried to run simple word count as MapReduce job. Everything works fine when run locally (all work done on Name Node). But, when I try to run it on a cluster using YARN (adding mapreduce.framework.name=yarn to mapred-site.conf) job hangs. I came…
KaP
  • 387
  • 1
  • 2
  • 12
21
votes
3 answers

Difference between Application Manager and Application Master in YARN?

I understood how MRv1 works.Now I am trying to understand MRv2.. what's the difference between Application Manager and Application Master in YARN?
hadooper
  • 726
  • 1
  • 6
  • 18