I am running a sample hadoop job over ~500 documents on S3, and when ran locally it takes <15min to complete. However, when I tried running the same job on EMR, it takes over 2 hours and still didn't complete the reduction step, so I terminated it. Would there be a particular reason why a MapReduce
job takes so long on EMR?
Also, along the same lines, what would be the best way to profile EMR to see where the bottleneck is? I can't seem to get the log files from the reducers until they complete, but they are taking way too long to complete..