I am new to MapReduce. I started with the simple word-count example.
Using Eclipse IDE, I created a simple Java Maven project, added MapReduce dependencies, compiled my program into a Jar, copied it over to the Cloudera CDH VM, executed it with dummy input data. Once I was satisfied it was running successfully, I took that Jar into my AWS EMR environment and ran it there with a larger (production) dataset.
So, Eclipse is my IDE, Cloudera CDH VM is my Dev environment, and AWS EMR is my production environment.
This setup works fine when I am dealing with a small project like word count, but the bigger my MapReduce projects get, the more cumbersome it is to transport Jar files between environments. It makes iterative development very tedious.
I was wondering if this environment setup I have can be tuned/revamped/scarapped and rebuilt to make it more suitable for iterative and large scale MapReduce development projects.
Any help/tips appreciated. Dankeschön.