0

We have a nice, big, complicated elastic-mapreduce job that has wildly different constraints on hardware for the Mapper vs Collector vs Reducer.

The issue is: for the Mappers, we need tonnes of lightweight machines to run several mappers in parallel (all good there); the collectors are more memory hungry, but it should still be OK to give them about 6GB of peak heap each . . . but, the problem is the Reducers. When one of those kicks off, it will grab up about 32-64GB for processing.

The result it that we get a round-robbin type of task death because the full memory of a box is consumed, which causes that one mapper and reducer to both be restarted elsewhere.

The simplest approach would be if we could somehow specify a way to have the reducer run on a different "group" (a handful of ginormous boxes) while having the mappers/collectors running on smaller boxes. This could also lead to significant cost-savings as well, as we really shouldn't be sizing the nodes mappers are running on to the demands of the reducers.

An alternative would be to "break up" the job so that there's a 2nd cluster that can be spun up to process the mappers collector's output--but, that's obviously "sub-optimal".

So, the question are:

  • Is there a way do specify what "groups" a mapper or a reducer will run upon Elastic MapReduce and/or Hadoop?
  • Is there a way to prevent the reducers from starting until all the mappers are done?
  • Does anyone have other ideas on how to approach this?

Cheers!

David Beveridge
  • 560
  • 1
  • 6
  • 17

1 Answers1

0

During a Hadoop MapReduce job, Reducers start running after all the Mappers are done. The output from the Map phase is shuffled and sorted before partitioning takes place to decide which Reducer receives which data. So, Reducers start running after the Shuffle/Sort phase has ended (after the mappers are done).

blazy
  • 338
  • 1
  • 8
  • It seems to not be that way: [link](http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-troubleshoot-slow-6.html) `mapred.reduce.slowstart.completed.maps The amount of maps tasks that should complete before reduce tasks are attempted. Not waiting long enough may cause “Too many fetch-failure” errors in attempts.` But, I've found a nuance with what we are attempting--the Combiner, though it runs as part of a Mapper task/memory, logs as a Reducer--so that's the part we are digging into now. – David Beveridge Apr 15 '14 at 23:15
  • 1
    @David **mapred.reduce.slowstart.completed.maps** represents a threshold that should be attained before the Reduce phase commences. It is important to note that the Reduce phase is further composed of 3 steps : shuffle, sort and reduce. The shuffle step could commence before all the Mappers are done since it implies sending data across the network to potential Reducers. Whereas, the sort and reduce steps commence only after all Mappers are done. As for the use of a Combiner,you should be wary as it could incur some overhead cost.Use it only when necessary and your job should not depend on it. – blazy Apr 16 '14 at 07:46
  • thanks for the additional info. We've managed to wrangle it down by using that--so one job works. Our other job is having a similar issue--but with the mappers: Each of the mappers must load an AI, this takes about 32GB-48GB of RAM to operate. What we REALLY need to do is be able to restrict the mappers so that each node runs no more than one at a time (it looks like each is running 8 (1/core presumably) ). So, is there a way to limit the mappers running _per node_ to a specific number (in our case, 1 (though we'd make it configurable) ). – David Beveridge Apr 17 '14 at 22:04
  • 1
    @DavidBeveridge I think what you're looking to do is to set `mapred.tasktracker.map.tasks.maximum` to 1. – blazy Apr 18 '14 at 09:36
  • It is! It turns out that the problem was that EC2 was overriding/ignoring our setting in the job (our setting would show up in the logs, but it wasn't until we altered the .xml files directly that any changes would stick). Thanks for all you're help. Oh, on a side note, we figured out the asymetric machines trick--when the Reducers are complete, the original job-class starts another process on the Master itself--we can make that an huge machine (i2.4xl or such), but keep the workers moderate (m2.4xl). – David Beveridge Apr 18 '14 at 16:55