Hadoop stacked when mapreduce and spark compete

Asked Oct 09 '15 at 12:27

Active Oct 09 '15 at 12:27

Viewed 59 times

I manage a cluster with several machines that is shared with other colleagues. Some using spark and some using Map Reduce.

Spark users usually open a context and have it open for days or weeks, while in MR the jobs start and finish.

The problem is a lot of times the MR job get stacked because:

After X% of the map phase it start running reducers.
Eventually you have a lot of reducers running and only 5-15 maps waiting to execute.
At this point there is no enough memory to start a new map, and the reducers cannot go over 33% because the maps have not finished yet producing a deadlock.

The only way to solve this problem is by killing one of the spark context and letting the maps finish.

Is there a way to configure yarn to avoid this problem?

Thanks.

asked Oct 09 '15 at 12:27

user1753235

Have a look into preemption feature of both Fair and Capacity schedulers. – Ashrith Oct 10 '15 at 16:05
Even, with the preemption enabled the jobs are still stacked in a deadlock. – user1753235 Dec 14 '15 at 09:02

0 Answers0