2

Im trying to do a elastic mapreduce job with code below, but when I try this I get an error: InstanceProfile is required for creating cluster

Someone knows why Im getting this error?

def createmrjob(dict):  
    emr = boto.emr.connect_to_region('us-east-1')

    print ""
    print "Conected to Elastic MapReduce."

    print "Creating Streaming step"

    bucket = dict['bucket']
    print bucket
    step = StreamingStep(name='Test',
    mapper=dict['mapper'],
    reducer=dict['reducer'],
    input=msg['datafile'],
    output='s3n://'+bucket+'/uploadedfiles/')

    print "Creating job flow"
    jobid = emr.run_jobflow(name="Data Processing", 
    log_uri="s3://"+bucket+"/uploadedfiles/erm_logs/",
    steps=[step],
    num_instances = 1,
    )

createmrjob(msg)

I already tried to create an instance profile using iam:

iam.create_instance_profile("instance", path = None)

and then in elastic mapreduce job add this:

steps=[step],
api_params = {
      'IamInstanceProfile':'instance', 
}
)

But the issue continues..

techman
  • 423
  • 1
  • 7
  • 17
  • 1
    you have so specify an instanceProfile and serviceRole for your job, have a a look at this [example](http://stackoverflow.com/questions/26314316/how-to-launch-and-configure-an-emr-cluster-using-boto) – FtoTheZ Jun 01 '15 at 13:05
  • Thanks for your answer. I tried to specify an instanceProfile as you can see in my question but dont works. In your example it shows how to specify the serviceRole, but dont shows how to specify instanceProfile. – techman Jun 01 '15 at 13:17
  • I think there is some kind of "definition problem", try to set it like this in your run_jobflow as they do it in the example: job_flow_role="EMR_EC2_DefaultRole", # this should be the instanceProfile ARN e.g "instance" service_role="EMR_DefaultRole" # this should be the Service Role ARN Both profiles should have sufficient roles to as described in the EMR documentation – FtoTheZ Jun 01 '15 at 14:18

0 Answers0