0

I need to pass in multiple files to the hadoop streaming job. As per the doc, -file option does take the directory as an input as well. however it does not seem to work. The reducer throws a file not found error. The other options are to pass each file separately using -file option which is not very optimal considering i have 100s of files. One more option is to zip the files and pass it as a tarball and unzip them in the reducer

Any other better options?

ideally i would just like to pass the directory as the value to -file parameter, given the hadoop documentation suggests that -file takes in a directory as well

akshit
  • 11
  • 4

1 Answers1

0

Are you sure you mean the reducer throws a file not found error? It sounds more like an issue with the user not being able to read the results folder if it is the reducer throwing the error.

-file definitely works with a directory, I have a hadoop streaming job which takes a directory and runs against the 6 files in the folder.

Remember that the path supplied for the -file command is the path in HDFS, so use the ls command to make sure that the path is correct.

Finally, make sure that you have permission to read the directory with the user you are using the run the job. Whilst I don't know exactly what error you would get if you don't have permission, it is possible that it may be a "file not found" error.

Yeggstry
  • 39
  • 8