In JupyterHub, installed in an EC2 instance with an IAM role which allows access to a specific S3 bucket when I try to access a file in that bucket with this code:
s3nRdd = spark.sparkContext.textFile("s3n://bucket/file")
I get this error:
IllegalArgumentException: u'AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively).'
However, when I export the AWS access key id and secret access key in the kernel configuration having the same policy as that role, the read for that file succeeds.
As the best practice is to use IAM roles, why doesn't the EC2 role work in this situation?
--update-- The EC2 IAM role has these 2 policies:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1488892557621",
"Action": "s3:*",
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::<bucket_name>",
"arn:aws:s3:::<bucket_name>/*"
]
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "ec2:*",
"Effect": "Allow",
"Resource": "*"
},
{
"Sid": "Stmt1480684159000",
"Effect": "Allow",
"Action": [
"iam:PassRole"
],
"Resource": [
"*"
]
}
]
}
Also, I am using hadoop version 2.4.0 which doesn't support s3a
protocol and updating is not an option.