Our use case is to connect Azure Datafactory (ADF) to AWS S3, but use the Managed Identity (MSI) of ADF for authentication and authorization.
TL;DR version
The problem we run into is that we require the access_token for the MSI in ADF, so we can exchange that for temporary credentials in AWS IAM service. We need this access_token in text, such that we can provide it in the right way to the IAM service.
Situation (longer version)
High over, the solution should work like this:
- ADF will get an access token for a specific resource using MSI
- Using the access token, ADF will then get temporary credentials with AWS
- Using the temporary credentials, ADF will get data from S3.
In order to do this, we needed a couple of things (heavily inspired by this blog):
Azure side:
- We created an App Registration, and set an Application ID URI (which will be the 'scope' claim in the AzureAD access_token request).
- We created a custom role in that App Registration.
- In the Enterprise Application object of this App Registration (at this point, I feel like I should appologize for Microsofts terminology..), we have ensured that User Assignment is required.
- We have assigned the custom role to our ADF MSI.
AWS side:
- Added our AzureAD as an Identity Provider
- Set the audience to the same value as Application ID URI.
- Added a new role with a trusted entity of type Web Entity, and added proper S3 permissions to it.
Then to test this all out, we created an Azure Function (http triggered) which returns the request headers as body. We then created a Web Activity in ADF to this Azure Function endpoint, and set the authentication to "System Assigned Managed Identity", with a resource the same as the aforementioned Application ID URI. The result is that we get the Authorization
header value, which we then manually put into a request to the AWS IAM service to exchange for the temporary credentials. The request to the AWS IAM service has the format of https://sts.amazonaws.com/?Action=AssumeRoleWithWebIdentity&RoleSessionName=app1&RoleArn=<arn>&WebIdentityToken=<access token>
. This provides us with credentials, which can be used in a Linked Service in ADF (we tested this).
Problem statement
We now use Azure Function, in order to have ADF automatically get an access_token for the requested (AWS) resource (Application ID URI), and add that access_token to the request to the Function, which solely returns it to us. We want to do this without an additional component. I can think of two ways:
- (option 1) - A web activity to some Microsoft endpoint that returns the access_token immediately.
- (option 2) - Have AWS take an
Authorization
header rather than aWebIdentityToken
query parameter.
I spent some time on option 2, but that seems like a no go; the access_token really needs to be part of the URL parameters when trying to exchange them for temporary AWS credentials.
Option 1 however, I had an idea; there is the IMDS on virtual machines in Azure. This can be used to get access_tokens when you are on a VM rather than a PaaS service. I tried making a call to http://169.254.169.254/metadata/identity/oauth2/token?api-version=2021-12-13&resource=<Application ID URI>
using Web Activity (both with a AutoResolveIR and a SelfHosted IR!), but I got the error [ClientSideException] Value does not fall within the expected range
. I did set the header Metadata
to value true
as described in the docs.
Is there another way? Apologies if this is an abundance of information, but it does provide you with all the required details of what has been tried and how the setup should (and can) work.