I'm running a Python streaming job on Amazon's Elastic MapReduce which needs to output multiple files from the reducer. The descriptions I've found on the web of how to do this have all been old, so they reference the deprecated property mapred.work.output.dir
but when I attempt to create files in the directory pointed to by the modern equivalent, mapreduce.task.output.dir
(ie mapreduce_task_output_dir
for streaming jobs) I get a File or Directory Not Found error:
OSError: [Errno 2] No such file or directory: 's3://mybucket-data/output/encounter/_temporary/1/_temporary/attempt_1416321762038_0001_r_000003_0'
The documentation for FileOutputFormat.getWorkOutputPath() seems to indicate that this should still work.
I suspect the issue has to do with this pointing to S3, but I don't know if I should be using a different (ie local) directory (if so, what property do I need?) or figuring out how to get Python to write to S3 or ...?