I need to pass in multiple files to the hadoop streaming job. As per the doc, -file option does take the directory as an input as well. however it does not seem to work. The reducer throws a file not found error. The other options are to pass each file separately using -file option which is not very optimal considering i have 100s of files. One more option is to zip the files and pass it as a tarball and unzip them in the reducer
Any other better options?
ideally i would just like to pass the directory as the value to -file parameter, given the hadoop documentation suggests that -file takes in a directory as well