0

This is an odd one, recently we started to migrate from an older CDH 4.2.1 cluster running MRv1 to a CM5 managed CDH 5.2.0 cluster running Mrv2(YARN) and have run into some rather unusual problems. The workflow processes roughly 1.2TB of data and on the CDH 4.2.1 cluster the processing queries fired use no reducers and each individual map output is stored as a single file(takes about 35 minutes)

On the CDH 5.2.0 cluster the workflow fails most of the time (after 3+ times the length of time normally taken) and always attempts to combine the output of all the mappers into a single file, we think that this is where it is falling over.

All the error logs point to the Shuffle and sort phase failing with an out of heap space error.

We have tried using the two parameters to specify no reducers (mapred.reduce.tasks = 0 and mapreduce.jobs.reduces = 0) but this has no effect.

This is a HiveQL query using a python transform to process data fields and the exact code,queries,tables, and workflow has been migrated.

Has anyone else run into this problem or could anyone shed some light on it?

Thanks,

Anthony

Anthony
  • 1
  • 1
  • Managed to eventually fix it by setting hive.optimize.sort.dynamic.partition to false... Seems this parameter is set to true by default in CDH5. – Anthony Apr 13 '15 at 12:35
  • Actually CDH4 doesn't support this parameter at all... – Anthony Apr 13 '15 at 15:28

0 Answers0