0

I'm trying to run a instance on Amazon EC2 using python MRJob here is the simple python script to find the most used word in a txt file

from mrjob.job import MRJob

class MRWordFrequencyCount(MRJob):
    def mapper(self, _, line):
        yield "chars", len(line)
        yield "words", len(line.split())
        yield "lines", 1

    def reducer(self, key, values):
        yield key, sum(values)


if __name__ == '__main__':
MRWordFrequencyCount.run()

Here is my mrjob.conf file:

runners:
  emr:
    aws_access_key_id: XXXXXXXXXXXXXXXXXX
    aws_secret_access_key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    aws_region: us-west-1
    ec2_key_pair: EMR
    ec2_key_pair_file: ~/EMR.pem   # ~/ and $ENV_VARS allowed here
    ssh_tunnel_to_job_tracker: true

When I run the script:

python MRMostUsedWord.py -r emr romeo.txt > most_used_word.out

I get the following error:

<Error>
<Type>Sender</Type>
<Code>ValidationError</Code>
<Message>InstanceProfile is required for creating cluster</Message>
</Error>
<RequestId>4d1a1e3b-e665-11e4-b9e1-a557982e1081</RequestId>
</ErrorResponse>

Do you have any idea why I am getting this error?

I'M also creating Instance profiles using the command:

aws emr create-default-roles

Maybe mrjob.conf file needs to be modified? But I don't know how?

hero
  • 11
  • 1

1 Answers1

0

Incase you use AWS IAM to configure AWS permissions, you can specify an IAM profile for a job with iam_job_flow_role MRJob option. See iam_job_flow_role for more details. In default case that would require the following line in mrjob.conf

iam-job-flow-role: EMRDefaultRole
alko
  • 46,136
  • 12
  • 94
  • 102