How to properly provide credentials for spark-redshift in EMR instances?

Question

We were trying to use the spark-redshift project, following the 3rd recommendation for providing the credentials. Namely:

IAM instance profiles: If you are running on EC2 and authenticate to S3 using IAM and instance profiles, then you must must configure the temporary_aws_access_key_id, temporary_aws_secret_access_key, and temporary_aws_session_token configuration properties to point to temporary keys created via the AWS Security Token Service. These temporary keys will then be passed to Redshift via LOAD and UNLOAD commands.

Our Spark application is running from an EMR cluster. For such purpose, we tried to obtain temporary credentials from inside instances of this node calling getSessionToken like this:

val stsClient = new AWSSecurityTokenServiceClient(new InstanceProfileCredentialsProvider())        
val getSessionTokenRequest = new GetSessionTokenRequest()
val sessionTokenResult =  stsClient.getSessionToken(getSessionTokenRequest);
val sessionCredentials = sessionTokenResult.getCredentials()

But this throws 403 Access Denied, even if the policy with sts:getSessionToken is applied to the role of the instances of EMR.

Then we tried the following two alternatives. First, using the AssumeRole policy:

val p = new STSAssumeRoleSessionCredentialsProvider("arn:aws:iam::123456798123:role/My_EMR_Role", "session_name")
val credentials: AWSSessionCredentials = p.getCredentials
val token = credentials.getSessionToken

and second, casting the result from InstanceProfileCredentialsProvider:

val provider = new InstanceProfileCredentialsProvider()
val credentials: AWSSessionCredentials = provider.getCredentials.asInstanceOf[AWSSessionCredentials]
val token = credentials.getSessionToken

They both work, but which is the expected way of doing this? Is there something terribly wrong about casting the result or adding the AssumeRole policy?

Thanks!

Ca n you provide more details around your solutions, like did you import any additonal packages or us ethe Java SDK? When I just use the above statements in spark-shell, it doesn't know what AWSSessionCredentials is. — J Calbreath, Feb 10 '16 at 19:44

score 1 · Accepted Answer · answered May 01 '17 at 16:16

The GetSessionToken API is meant to be called by IAM users, as said in their docs:

Returns a set of temporary credentials for an AWS account or IAM user.

On your first example, you are calling the API using your EMR instance role, which is an IAM role (some of the differences are explained here). In this specific case, the EMR instance role credentials are session credentials obtained by EMR on behalf of your instance.

What's the specific wording on your error? If it is Cannot call GetSessionToken with session credentials, that would confirm all of the above.

When you cast your instance role to a session token, it works, because as explained above, it turns out that an assumed role's credentials are session credentials, so it just works.

There is nothing wrong with calling AssumeRole explicitly. This is exactly what the EMR service does under the hood. There is also nothing wrong with casting your results to session credentials, as they're pretty much guaranteed to be session credentials on your use case.

How to properly provide credentials for spark-redshift in EMR instances?

1 Answers1