1

I am working on a master thesis project which aims at integrating a custom Map Reduce framework with similar MR interface but own implementation and pipeline, with higher level language frameworks as PIG.

Currently, the MR master and workers have been integrated with YARN so that such jobs can be launched on YARN. The framework is written in C++ and it's running OpenCL defined Map and Reduce functions.

The aim is to make the proprietary MR framework available for usage in as many scenarios as possible, maintaining it's own pipeline, with minimum changes to the applications or frameworks which employ Hadoop Map Reduce.

Given the large landscape of Hadoop projects I would need some pointers to resources, literature or documentation of how this has been achieved (Pig can run on top of Hadoop MR and Spark at least) or which options can be considered. (I am conducting reading already into YARN, Pig and so on but some pointers would be really helpful)

  • There are already several tools that do what it is that I think you're wanting to do. Scalding, Cascading and Crunch all offer pipelines over MapReduce. However all are moving/have moved to Tez, which offers a much faster way to run multi-stage jobs, without forcing the use of Map -> Reduce stages. – Ben Watson Nov 26 '15 at 17:31

0 Answers0