I am trying to create an HDInsight cluster in a certain subscription. Now the default storage type that I am selecting is of type ADLS Gen2 and the storage instance exists in the same subscription (the UI here will anyways list only the ADLS Gen2 storage accounts in the same subscription). And then as you can see in the screenshot below, the UI is also asking for user-assigned service identity as a required field. I don't understand the real need of this identity here. Since the cluster and the ADLS Gen2 will be in the same subscription, the cluster will anyway be able to access the storage -- since, the way it happens is that during deployment of the cluster the storage keys are fetched dynamically since they are in the same subscription. That's how the storage gets attached. So if that happens anyways, what is the need of specifying user-assigned managed identity? I also verified that the option to enter user-assigned managed identity is only shown when we select storage type as ADLS Gen2 and not for ADLS Gen1 and Azure Storage. ADLS Gen2 has blob as well as directory interface. But these are just interfaces, underneath it's anyways a blob storage which has access keys.In fact ADLS Gen1 doesn't have anything like access keys as it only provides directory interface, still we don't need to specify user-assigned managed identity for that, so I wonder why for ADLS Gen2 it asks that if all the resources are in the same subscription.

- 3,396
- 4
- 41
- 80
-
Maybe it does not use the storage access keys but instead uses the managed identity to do AAD authentication to the Storage? – juunas Oct 25 '19 at 10:17
-
that's the question, why is it a 'required' field if there is an option to let it use access keys, why does it have to use the managed identity only? – Dhiraj Oct 25 '19 at 10:22
-
Because MI is more secure: you can assign restricted privileges to it - storage key gives full control and secrets leakage is less likely. – Marc Oct 25 '19 at 20:01
1 Answers
Common challenge: When building cloud applications is how to manage the credentials in your code for authenticating to cloud services. Keeping the credentials secure is an important task. Ideally, the credentials never appear on developer workstations and aren't checked into source control. Azure Key Vault provides a way to securely store credentials, secrets, and other keys, but your code has to authenticate to Key Vault to retrieve them.
The managed identities for Azure resources feature in Azure Active Directory (Azure AD) solves this problem. The feature provides Azure services with an automatically managed identity in Azure AD. You can use the identity to authenticate to any service that supports Azure AD authentication, including Key Vault, without any credentials in your code.
A managed identity is an identity registered in Azure Active Directory (Azure AD) whose credentials are managed by Azure. With managed identities, you don't need to register service principals in Azure AD or maintain credentials such as certificates.
Managed identities can be used in Azure HDInsight to allow your clusters to access Azure AD domain services, access Azure Key Vault, or access files in Azure Data Lake Storage Gen2.
A user-assigned managed identity is created as a standalone Azure resource. Through a create process, Azure creates an identity in the Azure AD tenant that's trusted by the subscription in use. After the identity is created, the identity can be assigned to one or more Azure service instances. The lifecycle of a user-assigned identity is managed separately from the lifecycle of the Azure service instances to which it's assigned.
Reference: Managed identities in Azure HDInsight
Hope this helps.

- 12,191
- 1
- 19
- 42