0

How can I change this configuration? For my application, a split size of 64/128 is too much for me,

and I would like to have a split size of 16 mb for example.

How can I do it?

member555
  • 797
  • 1
  • 13
  • 40

1 Answers1

1

You can change the default block size by setting fs.s3n.block.size. Please try like below in your code - jobConf.set("fs.s3n.block.size", value);

Please refer the below links as well - http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-bootstrap.html

http://s3.amazonaws.com/awsdocs/ElasticMapReduce/latest/emr-dg.pdf

  • I'm using hadoop streaming – member555 Aug 21 '15 at 14:49
  • Where can i put this setting? fs.s3n.block.size Im running emr via the aws website, please help me – member555 Aug 23 '15 at 10:55
  • I havent tried EMR with streaming, I think you can also change the parameter as given in the below link. But not very sure https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html – Manasa Devadas Aug 24 '15 at 13:19
  • Ok i figured it out. Now i have another question: After the Job is done, im having alot of output files. How can i merge them into 1 before sending them to the second task? I have iterative jobs – member555 Aug 24 '15 at 13:32
  • I am not getting your question completely. You can give a directory with all the files in it as input to the job. MR job will process all the files in the directory – Manasa Devadas Aug 25 '15 at 05:19
  • Yes, but i want to concatenate them before that. Also, i can only specify input/output from S3, how can i use HDFS? – member555 Aug 25 '15 at 09:21