0

I'm going through this guide on how to get mrjob working on EMR. I follow all the steps, but when I run the example script I get this error:

matthew@WinterMute:~/work/projects/mrjob_examples$ python word_count.py -r emr moby.txt
using configs in /etc/mrjob.conf
using existing scratch bucket mrjob-4db6342a70e021ad
using s3://mrjob-4db6342a70e021ad/tmp/ as our scratch dir on S3
creating tmp directory /tmp/word_count.matthew.20140603.181541.006786
writing master bootstrap script to /tmp/word_count.matthew.20140603.181541.006786/b.py
Copying non-input files into s3://mrjob-4db6342a70e021ad/tmp/word_count.matthew.20140603.181541.006786/files/
Waiting 5.0s for S3 eventual consistency
Creating Elastic MapReduce job flow
Job flow created with ID: j-3DCN7LULSRILW
Created new job flow j-3DCN7LULSRILW
Job on job flow j-3DCN7LULSRILW failed with status FAILED: The given SSH key name was invalid
Logs are in s3://mrjob-4db6342a70e021ad/tmp/logs/j-3DCN7LULSRILW/
Scanning S3 logs for probable cause of failure
Waiting 5.0s for S3 eventual consistency
Terminating job flow: j-3DCN7LULSRILW
Traceback (most recent call last):
  File "word_count.py", line 16, in <module>
    MRWordFrequencyCount.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 494, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 512, in execute
    super(MRJob, self).execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 147, in execute
    self.run_job()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 208, in run_job
    runner.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/runner.py", line 458, in run
    self._run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 809, in _run
    self._wait_for_job_to_complete()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 1599, in _wait_for_job_to_complete
    raise Exception(msg)
Exception: Job on job flow j-3DCN7LULSRILW failed with status FAILED: The given SSH key name was invalid
mdornfe1
  • 1,982
  • 1
  • 24
  • 42
  • I'm currently having the same issue, but it is temperamental, there has been one occasion where it has worked. The rest of my attempts have failed though. – Tom Busby Aug 03 '14 at 22:19

2 Answers2

0

Your job seem to start fine, but then mrjob is unable to ssh to the master node in order to monitor it's status. It's hard to tell what exactly is set incorrectly withou seeing your config file, mainly ec2_key_pair_file and ec2_key_pair options. Make sure you followed Configuring AWS credentials guide. You have to specify a valid key pair name (check in EC2 management dashboard under "Key Pairs" section) and path to the corresponding .pem file.

alko
  • 46,136
  • 12
  • 94
  • 102
  • This is my conf file http://pastebin.com/qGxbiJsd saved in /etc/mrjob.conf. I'm pretty sure I followed the guide exactly (I deleted all those security credentials btw). – mdornfe1 Jun 05 '14 at 18:27
0

I found this question when searching for the error myself.

I managed to solve this - SSH keys are region-specific, so you will need to set the region in your mrjob.conf file to the same one that the SSH key belongs to:

runners:
    emr:
        aws_access_key_id: HADOOPHADOOPBOBADOOP
        aws_region: us-west-1
        aws_secret_access_key: MEMIMOMADOOPBANANAFANAFOFADOOPHADOOP

See here: https://pythonhosted.org/mrjob/guides/configs-basics.html

robertlayton
  • 612
  • 1
  • 7
  • 20