I am working on a master thesis project which aims at integrating a custom Map Reduce framework with similar MR interface but own implementation and pipeline, with higher level language frameworks as PIG.
Currently, the MR master and workers have been integrated with YARN so that such jobs can be launched on YARN. The framework is written in C++ and it's running OpenCL defined Map and Reduce functions.
The aim is to make the proprietary MR framework available for usage in as many scenarios as possible, maintaining it's own pipeline, with minimum changes to the applications or frameworks which employ Hadoop Map Reduce.
Given the large landscape of Hadoop projects I would need some pointers to resources, literature or documentation of how this has been achieved (Pig can run on top of Hadoop MR and Spark at least) or which options can be considered. (I am conducting reading already into YARN, Pig and so on but some pointers would be really helpful)