1

I am new to MapReduce. I started with the simple word-count example.

Using Eclipse IDE, I created a simple Java Maven project, added MapReduce dependencies, compiled my program into a Jar, copied it over to the Cloudera CDH VM, executed it with dummy input data. Once I was satisfied it was running successfully, I took that Jar into my AWS EMR environment and ran it there with a larger (production) dataset.

So, Eclipse is my IDE, Cloudera CDH VM is my Dev environment, and AWS EMR is my production environment.

This setup works fine when I am dealing with a small project like word count, but the bigger my MapReduce projects get, the more cumbersome it is to transport Jar files between environments. It makes iterative development very tedious.

I was wondering if this environment setup I have can be tuned/revamped/scarapped and rebuilt to make it more suitable for iterative and large scale MapReduce development projects.

Any help/tips appreciated. Dankeschön.

Quest Monger
  • 8,252
  • 11
  • 37
  • 43

1 Answers1

0

Not much has changed since i asked this question. Havent found a good alternative to copying jar files manually to hadoop execution environment. Also see this - Running MapReduce jobs on AWS-EMR from Eclipse

Community
  • 1
  • 1
Quest Monger
  • 8,252
  • 11
  • 37
  • 43