Amazon Elastic Map Reduce Hadoop Jobs

Question

Im new to Amazon Web Services and Map Reduce staff. My basic problem is I am trying to make an academic project were basically I am processing a large bunch of images and I need to detect a particular object in them. After I need a Map filled by objects made of key = averageRGB and value = BufferedImage of the object detected. I managed to do this application single threaded and that was not a problem. My questions are : If I make a map reduce job can I achieve the Map mentioned earlier? If this is possible..can I use the Map to do something with it before the job finishes so I get the final results? And 1 last question...If I upload my sample data in a single folder in S3 bucket, will the Elastic Map Reduce of Amazon take care to split that data onto the cluster and parallelize the process or I have to split the data myself over the cluster?

Excuse my ignorance but I cannot find the right answers on the net.

Thanks

score 0 · Answer 1 · answered Oct 16 '14 at 07:01

Yes you can use map as you have mentioned.

In reducer again you will get map for key and values there you can do more calculations before final results are sent.

when you upload you data to s3bucekt. You can use path as s3n for you input. Also specify s3bucket path to store output using s3n

When you provide input path using s3n, the EMR will automatically download files to EMR nodes and split them and distribute over all nodes. We need not do any thing for that purpose.

Amazon Elastic Map Reduce Hadoop Jobs

1 Answers1