I'm able to run a local mapper and reducer built using ruby with an input file.
I'm unclear about the behavior of the distributed system though.
For the production system, I have a HDFS set up across two machines. I know that if I store a large file on the HDFS, it will have some blocks on both machines to allow for parallelization. Do I also need to store the actual mappers and reducer files (my ruby files in this case) on the HDFS as well?
Also, how would I then go about actually running the streaming job so that it runs in a parallel manner on both systems?