0

We have our custom Change Feed processor deployed in single Region in AKS with 5 instances. Things were always running in single region fine. (Please note that each pod instance (feed processor) is assigned a unique [.WithInstanceName(new GUID)].

We recently moving to a multi region setup as following:

  • EastUS AKS Cluster = 5 pods (5 feed processor with each unique instance name)
  • WestUS AKS Cluster = 5 pods (5 feed processor with each unique instance name)

Now with the above setup, the result is not very consistent as sometimes after the AKS service deployment our feedprocessor stops recieving the events for some of the collections).

To fix this we need to eventually delete the lease collection and then everything starts working again.

We cannot go live with the above workaround, so need help to resolve the issue.

Here is the code snippet:

Container leaseContainer = cosmosClient.GetContainer(databaseName, leaseContainerName);
            changeFeedProcessor = cosmosClient.GetContainer(databaseName, sourceContainerName)
                .GetChangeFeedProcessorBuilder(processorName: sourceContainerName, async (IReadOnlyCollection<TContainer> changes, CancellationToken cancellationToken) => await onChangesDelegate(changes, cancellationToken))
                    .WithInstanceName($"{Guid.NewGuid()}")
                    .WithLeaseContainer(leaseContainer)
                    .Build();

where leaseContainerName = "container-lease"

Dadwals
  • 1,171
  • 2
  • 8
  • 15

1 Answers1

1

The problem is that you are mixing instances. If you want each region to work independently (the same document change go to both groups of processors independently), set a different processorName.

When you define a cluster of machines with a particular processorName, lease and monitored containers, you define a Deployment Unit. The change feed events are distributed across those machines.

If you deploy 2 clusters but with the same values, then the 10 pods are now the same Deployment Unit, so the changes are spread across the 10 pods, meaning now that a particular change will land in 1 of the instances on one of the region but the other region will not see it.

You could set as processorName the region name for example:

Container leaseContainer = cosmosClient.GetContainer(databaseName, leaseContainerName);
            changeFeedProcessor = cosmosClient.GetContainer(databaseName, sourceContainerName)
                .GetChangeFeedProcessorBuilder(processorName: regionName, async (IReadOnlyCollection<TContainer> changes, CancellationToken cancellationToken) => await onChangesDelegate(changes, cancellationToken))
                    .WithInstanceName($"{Guid.NewGuid()}")
                    .WithLeaseContainer(leaseContainer)
                    .Build();
Matias Quaranta
  • 13,907
  • 1
  • 22
  • 47
  • But we want 10 pods of same deployment unit only. The issue is not that one of the region is getting and other not. But the issue is that none of the region is getting the event. – Dadwals Mar 20 '22 at 00:23
  • Also please note that, we are using the same cosmos instance for both the region. – Dadwals Mar 20 '22 at 00:28
  • If none of the regions is getting the event, then in that case, you should be investigating if the problem is related to processing: https://learn.microsoft.com/en-us/azure/cosmos-db/sql/change-feed-processor#life-cycle-notifications. Either that or there is another set of VMs with the same configuration taking those changes. You can easily verify by going to your leases collection, checking the Owner on the lease docs and those should match to your Instance Names (you can use the pod Id as Instance Name). If they don't match, then that is the root cause. – Matias Quaranta Mar 21 '22 at 18:36