Amazon EMR: "no output" found in S3

Question

I am not getting any output in S3 when I run a job in Amazon EMR.

I specified the arguments:

-inputfile s3n://exdsyslab/data/file.txt -outputdir s3n://exdsyslab/output

When I checked the job log, I see that the job has completed successfully. But there is no output in the output folder of my bucket exdsyslab.

I also tried one more thing.

I chained two jobs: specified args while creating job flow:

-inputfile s3n://exdsyslab/data/file.txt -outputdir s3n://exdsyslab/result -outputdir1 s3n://exdsyslab/result1

The second job's input is the output of the first job.

I faced the following exception for the second job as the program was running:

The output folder, "result", already exists.

This happened because the directory was created by the first job in the chain. How do I specify the input and output for the second job in the mapreduce chain?

Why is there output in the s3 buckets specified in the arguments?

score 0 · Accepted Answer · edited May 23 '17 at 12:12

For correct output, use this:

-inputfile s3n://exdsyslab/data/file.txt -output s3n://exdsyslab/output

Note that the output directory is specified by "-output".

For chaining jobs: you can't do it the way you specified, you MUST create multiple steps to an existing job in order to execute it. This other answer may help you: https://stackoverflow.com/a/11109592/1203129

For your specific case, the input/output directories have to look like this:

Step 1:

 -inputfile s3n://exdsyslab/data/file.txt -output s3n://exdsyslab/result

Step 2:

 -input s3n://exdsyslab/result -output s3n://exdsyslab/result1

Amazon EMR: "no output" found in S3

1 Answers1