1

Folks, The following python script is terminating with

job state = FAILED

and

Last State Change: Access denied checking streaming input path: s3n://elasticmapreduce/samples/wordcount/input/

Code:

import boto
import boto.emr
from boto.emr.step import StreamingStep
from boto.emr.bootstrap_action import BootstrapAction
import time

S3_BUCKET="mytesetbucket123asdf"
conn = boto.connect_emr()

step = StreamingStep(
  name='Wordcount',
  mapper='s3n://elasticmapreduce/samples/wordcount/wordSplitter.py',
  reducer='aggregate',
  input='s3n://elasticmapreduce/samples/wordcount/input/',
  output='s3n://' + S3_BUCKET + '/wordcount/output/2013-10-25')

jobid = conn.run_jobflow(
    name="test",
    log_uri="s3://" + S3_BUCKET + "/logs/",
    visible_to_all_users="True",
    steps = [step],)

state = conn.describe_jobflow(jobid).state
print "job state = ", state
print "job id = ", jobid
while state != u'COMPLETED':
    print time.localtime()
    time.sleep(10)
    state = conn.describe_jobflow(jobid).state
    print conn.describe_jobflow(jobid)
    print "job state = ", state
    print "job id = ", jobid

print "final output can be found in s3://" + S3_BUCKET + "/output" + TIMESTAMP
print "try: $ s3cmd sync s3://" + S3_BUCKET + "/output" + TIMESTAMP + " ."
Cmag
  • 14,946
  • 25
  • 89
  • 140
  • what happens if you try `input='s3n://elasticmapreduce/samples/wordcount/input',` or `input='s3n://elasticmapreduce/samples/wordcount/input/*'` instead? – alko Nov 01 '13 at 13:34

1 Answers1

0

The problem is somewhere in boto... If we specify IAM user instead of using Roles, job works perfectly. EMR supports IAM Roles ofcourse... and the IAM role we tested with has full rights to execute any task, so its not a mis-configuration issue...

Cmag
  • 14,946
  • 25
  • 89
  • 140