4

I understood YARN and Spark. But I want to know when I need to use Yarn and Spark processing engine. What are the different case studies in that I can identify the difference between YARN and Spark?

user4157124
  • 2,809
  • 13
  • 27
  • 42
chandu kavar
  • 411
  • 1
  • 4
  • 13

2 Answers2

20

You cannot compare Yarn and Spark directly per se. Yarn is a distributed container manager, like Mesos for example, whereas Spark is a data processing tool. Spark can run on Yarn, the same way Hadoop Map Reduce can run on Yarn. It just happens that Hadoop Map Reduce is a feature that ships with Yarn, when Spark is not.

If you mean comparing Map Reduce and Spark, I suggest reading this other answer.

codebox
  • 19,927
  • 9
  • 63
  • 81
matthieun
  • 763
  • 5
  • 13
7

Apache Spark can be run on YARN, MESOS or StandAlone Mode.

Spark in StandAlone mode - it means that all the resource management and job scheduling are taken care Spark inbuilt.

Spark in YARN - YARN is a resource manager introduced in MRV2, which not only supports native hadoop but also Spark, Kafka, Elastic Search and other custom applications.

Spark in Mesos - Spark also supports Mesos, this is one more type of resource manager.

Advantages of Spark on YARN

  • YARN allows you to dynamically share and centrally configure the same pool of cluster resources between all frameworks that run on YARN.
  • YARN schedulers can be used for spark jobs, Only With YARN, Spark can run against Kerberized Hadoop clusters and uses secure authentication between its processes.

Link for more documentation on YARN, Spark.

We can conclude saying this, if you want to build a small and simple cluster independent of everything go for standalone. If you want to use existing hadoop cluster go for YARN/Mesos.

Karthik
  • 1,801
  • 1
  • 13
  • 21
  • @chandukavar Except for the last sentence, this answer is not subjective or his 'opinion'. This answer in very simple terms explains the different scenarios for using Spark with or without Yarn. This should definitely be marked as the accepted answer as it's more comprehensive and more useful for the community than matthieun's answer (which is also correct). – pedram bashiri Nov 13 '19 at 19:13