2

Our use case is to connect Azure Datafactory (ADF) to AWS S3, but use the Managed Identity (MSI) of ADF for authentication and authorization.

TL;DR version

The problem we run into is that we require the access_token for the MSI in ADF, so we can exchange that for temporary credentials in AWS IAM service. We need this access_token in text, such that we can provide it in the right way to the IAM service.

Situation (longer version)

High over, the solution should work like this:

  1. ADF will get an access token for a specific resource using MSI
  2. Using the access token, ADF will then get temporary credentials with AWS
  3. Using the temporary credentials, ADF will get data from S3.

In order to do this, we needed a couple of things (heavily inspired by this blog):

Azure side:

  • We created an App Registration, and set an Application ID URI (which will be the 'scope' claim in the AzureAD access_token request).
  • We created a custom role in that App Registration.
  • In the Enterprise Application object of this App Registration (at this point, I feel like I should appologize for Microsofts terminology..), we have ensured that User Assignment is required.
  • We have assigned the custom role to our ADF MSI.

AWS side:

  • Added our AzureAD as an Identity Provider
  • Set the audience to the same value as Application ID URI.
  • Added a new role with a trusted entity of type Web Entity, and added proper S3 permissions to it.

Then to test this all out, we created an Azure Function (http triggered) which returns the request headers as body. We then created a Web Activity in ADF to this Azure Function endpoint, and set the authentication to "System Assigned Managed Identity", with a resource the same as the aforementioned Application ID URI. The result is that we get the Authorization header value, which we then manually put into a request to the AWS IAM service to exchange for the temporary credentials. The request to the AWS IAM service has the format of https://sts.amazonaws.com/?Action=AssumeRoleWithWebIdentity&RoleSessionName=app1&RoleArn=<arn>&WebIdentityToken=<access token>. This provides us with credentials, which can be used in a Linked Service in ADF (we tested this).

Problem statement

We now use Azure Function, in order to have ADF automatically get an access_token for the requested (AWS) resource (Application ID URI), and add that access_token to the request to the Function, which solely returns it to us. We want to do this without an additional component. I can think of two ways:

  • (option 1) - A web activity to some Microsoft endpoint that returns the access_token immediately.
  • (option 2) - Have AWS take an Authorization header rather than a WebIdentityToken query parameter.

I spent some time on option 2, but that seems like a no go; the access_token really needs to be part of the URL parameters when trying to exchange them for temporary AWS credentials.

Option 1 however, I had an idea; there is the IMDS on virtual machines in Azure. This can be used to get access_tokens when you are on a VM rather than a PaaS service. I tried making a call to http://169.254.169.254/metadata/identity/oauth2/token?api-version=2021-12-13&resource=<Application ID URI> using Web Activity (both with a AutoResolveIR and a SelfHosted IR!), but I got the error [ClientSideException] Value does not fall within the expected range. I did set the header Metadata to value true as described in the docs.

Is there another way? Apologies if this is an abundance of information, but it does provide you with all the required details of what has been tried and how the setup should (and can) work.

JarroVGIT
  • 4,291
  • 1
  • 17
  • 29

2 Answers2

0

It sounds like you're using Azure AD as an identity provider in AWS. If possible, you can create a AWS user with a permanent access key/secret key. The AWS user can have access to your S3 buckets, and you won't need to deal with STS in ADF.

Another idea is to use Azure KeyVault. When you create your S3 linked service in ADF, you can parameterize the access key and secret key. Your AWS access key and secret key will be stored in Azure KeyVault. Then you can have a Azure function that updates the KeyVault on a schedule or at the start of your ADF pipeline.

Joe
  • 497
  • 1
  • 3
  • 11
  • Thanks for your answer! Unfortunately, we do not want to use AWS user with access/secret keys as we have strict key rotation policies. It would require an additional component to deal with that automatically, which is not what we want. If we were to use an additional component, we would use an Azure Function that would retrieve the keys that we need from STS in AWS. But we are still looking for a way for not adding additional complexity to the setup (but I am increasingly convinced it cannot be done). – JarroVGIT Feb 19 '23 at 10:23
  • Have you considered setting your STS tokens to the max time to live of 36 hours, then using a Azure function on a timer to refresh the tokens in KeyVault? It adds extra setup, but removes complexity from your ADF pipelines which you have right now. – Joe Feb 19 '23 at 15:18
  • 1
    The whole point is to not use an additional component. If we go the route of having an additional component, then there wouldn't be an issue. The problem is that it is an incredible expensive solution as we are forced to run Azure Functions in a VNET (which requires a Premium plan, which is pay-per-hour and not pay-per-use). There for, we want to avoid implementing an additional component. – JarroVGIT Feb 19 '23 at 17:04
  • Ok I understand now. I thought you were just trying to move the Azure function calls out of your ADF pipelines to reduce complexity within ADF. Unfortunately, I'm not aware of any way to get STS tokens refreshed in ADF or KeyVault using a cheaper method. If you want something cheaper and easier, I strongly recommend creating a new AWS user and setting up rotation of the AWS secret keys and access keys (example here: https://aws.amazon.com/blogs/security/how-to-rotate-access-keys-for-iam-users/). This will be much easier than managing a federated AWS user and STS within Azure. – Joe Feb 19 '23 at 18:53
  • There are some ways to automatically rotate AWS user keys, although I've personally only done manual rotation on these. Here's an example using AWS organizations. You might be able to adapt this to your use case. I'd imagine this could work even without the organizations component: https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/automatically-rotate-iam-user-access-keys-at-scale-with-aws-organizations-and-aws-secrets-manager.html – Joe Feb 19 '23 at 18:58
0

After consultation with Microsoft, we have reached the conclusion that this cannot be done. We have opted to create an Azure Function, that will get an access token based on its own MI, which can only be called with the MI of the ADF instance. The Function will then store the temporary key in a KeyVault.

The call to the Azure Function will be synchronous. When it completes (should take only seconds), then we have guaranteed a valid key in KeyVault. We can use the standard copy activity with an S3 source, and point it to the KeyVault to retrieve the STS token. So the entire flow is (ADF=Azure Data Factory, AF=Azure Function, AAD=Azure Active Directory, KV=KeyVault, STS=AWS Security Token Service, MI=Managed Identity):

  1. ADF calls AF using the ADF MI.
  2. AF (Python, using msal package) retrieves access token from AAD for resource S3 (which is the application ID URI of the app registration that represents the S3 resource).
  3. AF (Python, using requests) retrieves temporary tokens from STS using AssumeRoleWithWebIdentity endpoint.
  4. AF stores AccessKeyId, SecretAccessKey and SessionToken in KV and returns status 200.
  5. ADF continues and calls copy activity, with a linked service to Amazon S3, using Temporary security credentials as Authentication Type and references the values of the KV.
JarroVGIT
  • 4,291
  • 1
  • 17
  • 29