3

In JupyterHub, installed in an EC2 instance with an IAM role which allows access to a specific S3 bucket when I try to access a file in that bucket with this code:

s3nRdd = spark.sparkContext.textFile("s3n://bucket/file")

I get this error:

IllegalArgumentException: u'AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively).'

However, when I export the AWS access key id and secret access key in the kernel configuration having the same policy as that role, the read for that file succeeds.

As the best practice is to use IAM roles, why doesn't the EC2 role work in this situation?

--update-- The EC2 IAM role has these 2 policies:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1488892557621",
            "Action": "s3:*",
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::<bucket_name>",
                "arn:aws:s3:::<bucket_name>/*"
            ]
        }
    ]
}


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "ec2:*",
            "Effect": "Allow",
            "Resource": "*"
        },
        {
            "Sid": "Stmt1480684159000",
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

Also, I am using hadoop version 2.4.0 which doesn't support s3a protocol and updating is not an option.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
and_apo
  • 1,217
  • 3
  • 17
  • 41
  • Remember to run policy-simulator, you will notice that a "validated policy" doesn't mean it works as what you think. It waste me hour to learn that IAM policy restriction only works under IAM user level. You cannot specify restriction on resources that no under IAM(e.g. SQS, S3, etc). You need to write specific resource policy (not IAM policy) and attach to those resource permission. – mootmoot Mar 13 '17 at 10:35

2 Answers2

2

You must create a bucket policy to allow access from particular IAM roles. Since S3 doesn't trust the roles, the API just fallback and ask for access key.

Just add soemthing like this in your bucket policy, replace all the custom <> parameter with your own values.

{
    "Version": "2012-10-17",
    "Id": "EC2IAMaccesss",
    "Statement": [{
            "Sid": "MyAppIAMRolesAccess",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::<acc_id>:role/<yourIAMroleName>"
                ]
            },
            "Action": [
                "s3:ListBucket",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::<yourbucket>/*",
                "arn:aws:s3:::<yourbucket>"
            ]
        }
    ]
}

(updates)

  1. Make sure you give proper policy to the EC2 IAM Roles, because IAM roles is very powerful, no Policy is attach to it out of the box. You must assign a policy, e.g. for minimal S3 access, add AWSS3ReadOnly policy to the roles.

  2. You may encounter issues of spark problematic interaction with IAM roles. Please check the documentation on spark access through s3n:// schema. Otherwise, use s3a://

mootmoot
  • 12,845
  • 5
  • 47
  • 44
  • I have added this as a bucket policy but I still get the same message. – and_apo Mar 12 '17 at 11:53
  • @and_apo Answer updated. check updated item 1 and 2. – mootmoot Mar 13 '17 at 08:48
  • For point 1, my EC2 IAM role has full access to the bucket - updated my answer with the policies attached to that role. Also I cannot use s3a, since hadoop version is tight to 2.4 and doesn't support s3a - also updated my question for this. – and_apo Mar 13 '17 at 10:15
  • @and_apo please check whether this applicable to you . https://github.com/nchammas/flintrock/pull/180 – mootmoot Mar 13 '17 at 10:31
2

S3n doesn't support IAM roles, and 2.4 is a very outdated version anyway. Not as buggy as 2.5 when it comes to s3n, but still less than perfect.

If you want to use IAM roles, you are going to have to switch to S3a, and yes, for you, that does mean upgrading Hadoop. sorry.

stevel
  • 12,567
  • 1
  • 39
  • 50