I am new to mrjob and I am having problems to get the job running on Amazon EMR. I will write them in sequential order.
- I can run a mrjob on my local machine. However when I have mrjob.conf in /home/ankit/.mrjob.conf and in /etc/mrjob.conf, the job is not executed on my local machine. Here is what I am getting. https://s3-ap-southeast-1.amazonaws.com/imagna.sample/local.txt
- What is MRJOB_CONF in "the location specified by MR_CONF" in the documentation?
- What is the use of 'base_tmp_directory' ? Also, do I need to upload the input data in S3 before starting the job or it will load from my local computer while starting the execution?
- Do I need to do some bootstrapping if I use some libraries like numpy, scikit etc? If yes, how?
- This is what I am getting when I execute the command for running a job on EMR https://s3-ap-southeast-1.amazonaws.com/imagna.sample/emr.txt
Any solutions?
Thanks a lot.