1

Short Version:

I am creating an Azure Active Directory Group, an Azure KeyVault to which the group has access, a key in that vault, a PostgresServer whose principal is a member of the group. The server is wrapped in a ComponentResource.

The server is supposed to use the key for encryption, but does not have access at first - it can only access the key when there is a time delay before trying to use the key.

Question: How can I make sure that the permissions have propagated before trying to use the key to encrypt?

Long Version

The infrastructure is the same as I have described above. It is created like this:

group = azuread.Group(
    "security_group",
    display_name=display_name,
    description=description,
    owners=None,
    opts=ResourceOptions(),
    security_enabled=True,
)

policy = azure.keyvault.AccessPolicyEntryArgs(
    object_id=group.object_id,
    tenant_id=tenant_id,
    permissions=azure.keyvault.PermissionsArgs(
        keys=['get', 'list', 'unwrapKey', 'wrapKey']
    )
)

vault = azure.keyvault.Vault(
    "key_vault",
    resource_group_name=resource_group_name,
    location=location,
    properties=azure.keyvault.VaultPropertiesArgs(
        access_policies=[policy],
        # other properties
    ),
    opts=opts
)

class Postgres(ComponentResource)
    def __init__(self, group, vault):
        server = azure.dbforpostgresql.Server(
            "postgres_server",
            identity=azure.dbforpostgresql.ResourceIdentityArgs(type="SystemAssigned"),
            # other properties
        )

        postgres_principal_group_membership = azure_ad.GroupMember(
            "postgres-group-member",
            group_object_id=group.object_id,
            member_object_id=server.identity.principal_id,
        )


        key = azure.keyvault.Key(
            "encryption-key",
            key_name=f"encryption-key",
            # other properties
        )

        def make_key_uri_with_version(args):
            vault_name, key_name, key_uri_with_version = args
            key_version = key_uri_with_version.rsplit('/', 1)[-1]
            return f"{vault_name}_{key_name}_{key_version}"

        postgres_key_name = Output.all(
                encryption_key_vault.name,
                key.name,
                key.key_uri_with_version
            ).apply(make_key_name)


        ### Everything before here is created without issues
        ### This resource cannot be created during the first attempt to run this - but is sucessfull during the second run
        ### We can also make it succeed during the first try if we do:
        # import time
        # time.sleep(120)
        ### This sleep does not need to be placed right here or inside the constructor at all.
        ### It will help as long as it happens after creating the vault, but before completing the constructor

        azure.dbforpostgresql.ServerKey(
            f"server-use-encryption-key",
            key_name=postgres_key_name,
            server_key_type="AzureKeyVault",
            server_name=server.name,
            uri=key.key_uri_with_version,
            # other properties
        )

Postgres(group, vault)

When this is executed for the first time, the following error occurs:

Error: resource partially created but read failed autorest/azure: Service returned an error.
Status=404
Code="ResourceNotFound"
Message="The requested resource of type 'Microsoft.DBforPostgreSQL/servers/keys' with name '<name of the key>' was not found.":
Code="AzureKeyVaultMissingPermissions" 
Message="The server '<name of the postgres server>' requires following Azure Key Vault permissions: 'Get, WrapKey, UnwrapKey'. Please grant any missing permissions to the service principal with ID '<postgres server principal ID>'."

I have verified that the problem is one of timing by bluntly using time.sleep(120) before attempting to connect the encryption key, which made the process work. In order to make the code more stable/quicker (rather than waiting for a fixed time and hoping that it will be long enough), I think that checking for the actual permissions is the way to go.

Currently, I don't see any way to do this with the azure-native provider by Pulumi.

Therefore, I am thinking of using the azure API directly - which would imply a new dependency, which is not ideal in itself.

Question: Is there a better way to achieve the desired result? I've tried various explicit dependsOn values when setting the encryption key, with no success.

Thanks!

Kjeld Schmidt
  • 771
  • 8
  • 19
  • Do you know what exactly you'd need to check? The issue seems to be caused by a delay in propagation from Azure AD to KeyVault/ARM - do you know how to validate that it's completed? – Mikhail Shilkov Sep 07 '21 at 10:02
  • 1
    What we need to validate is that the AD permission for the postgres server to access the key vault is in place. ARM seems to return a "created" to Pulumi without the permission taking effect. That is why the Pulumi code then crashes, when the postgres server tries to access the customer managed key in the keyvault. The check we would have to make sure that the "permission postgres -> keyvault" is effective. – Michael Lihs Sep 07 '21 at 11:24
  • Do you know the API endpoint that would allow that? – Mikhail Shilkov Sep 07 '21 at 12:05
  • 1
    Actually it seems that the endpoint that gives us the vault permissions has the AD group and permissions right away. So the only check that comes to my mind would be trying to read the secret from the vault with the expected user - but since this is a managed identity in our case, I wouldn't know of any way to implement this. In other words: the permissions seems to be in place right way but not being effective for a unknown amount of time. I also have to add that we are trying all this within a component resource where everything happens in the constructor. – Michael Lihs Sep 07 '21 at 12:18
  • 1
    If you can find an API call that would allow you to check this, you can use approach similar to described here https://github.com/pulumi/pulumi-aws/issues/673#issuecomment-569944177, update your `make_key_uri_with_version` function to query for key readiness. – Artem Yarmoliuk Sep 09 '21 at 11:10

0 Answers0