I'm trying to copy data from an EMR cluster to S3 using s3-distcp. Can I specify the number of reducers to a greater value than the default so as to fasten my process?
Asked
Active
Viewed 1,213 times
2 Answers
5
For setting up number of reducers, you can use the property mapreduce.job.reduces
similar to below:
s3-dist-cp -Dmapreduce.job.reduces=10 --src hdfs://path/to/data/ --dest s3://path/to/s3/

Mandeep Singh
- 308
- 2
- 5
- 18
0
Using S3DistCp, you can efficiently copy large amounts of data from Amazon S3 into HDFS where it can be processed by subsequent steps in your Amazon EMR cluster.
You can call S3DistCp by adding it as a step in your existing EMR cluster. Steps can be added to a cluster at launch or to a running cluster using the console, AWS CLI, or API.
So you control the number of workers during EMR cluster creation or you can resize existing cluster. You can check exact steps in EMR docs.

Ankit Gupta
- 730
- 5
- 12