0

I am not getting any output in S3 when I run a job in Amazon EMR.

I specified the arguments:

-inputfile s3n://exdsyslab/data/file.txt -outputdir s3n://exdsyslab/output

When I checked the job log, I see that the job has completed successfully. But there is no output in the output folder of my bucket exdsyslab.

I also tried one more thing.

I chained two jobs: specified args while creating job flow:

-inputfile s3n://exdsyslab/data/file.txt -outputdir s3n://exdsyslab/result -outputdir1 s3n://exdsyslab/result1

The second job's input is the output of the first job.

I faced the following exception for the second job as the program was running:

The output folder, "result", already exists.

This happened because the directory was created by the first job in the chain. How do I specify the input and output for the second job in the mapreduce chain?

Why is there output in the s3 buckets specified in the arguments?

Wilduck
  • 13,822
  • 10
  • 58
  • 90

1 Answers1

0

For correct output, use this:

-inputfile s3n://exdsyslab/data/file.txt -output s3n://exdsyslab/output

Note that the output directory is specified by "-output".

For chaining jobs: you can't do it the way you specified, you MUST create multiple steps to an existing job in order to execute it. This other answer may help you: https://stackoverflow.com/a/11109592/1203129

For your specific case, the input/output directories have to look like this:

Step 1:

 -inputfile s3n://exdsyslab/data/file.txt -output s3n://exdsyslab/result 

Step 2:

 -input s3n://exdsyslab/result -output s3n://exdsyslab/result1
Community
  • 1
  • 1
Suman
  • 9,221
  • 5
  • 49
  • 62