1

I have some amount of containers in Cosmos DB that changes all the time. I need to provide some mechanism for reading all the changes from those containers.

I'm trying to implement builder/factory for Change Feed Processor (CFP). In my case, I have to create CFP instances dynamically for the different container. How I see the solution right now - I need a WebJob/Console Application that listens to the queue. When another application creates a new container in Cosmos DB it also sends a new message to the queue. Message in the queue contains all information for creating new CFP (connection string, collection name, lease container name, etc). The application creates new CFP and runs it in a new thread in the background forever.

Here is the code how I'm creating a new CFP

private void StartNewProcessor()
{
    new List<Task>().Add(Task.Run(async () =>
    {
        var container = Database.GetContainer(ContainerName);
        var lease = Database.GetContainer(LeaseName);

        var changeFeedProcessor = container.GetChangeFeedProcessorBuilder<Item>(ProcessorName, ProcessData)
            .WithLeaseContainer(lease)
            .WithInstanceName(InstanceName)
            .Build();

        await changeFeedProcessor.StartAsync();

        Console.WriteLine($"Change Feed Processor: {ProcessorName} have been started");

        Console.ReadKey(true);

        await changeFeedProcessor.StopAsync();
    }));
}

The problem is that it's a bad approach since there can be 100 and more collections in the future, so I'll need to create 100 extra threads in background. I'm looking for some ideas regarding the architecture application and how to do all that in the right way. It will be great if it is possible to handle changes for all containers in one application.

  • Why do you have so many collections? – Mark Brown Jan 27 '21 at 15:30
  • @MarkBrown I'm not the person who architect it. We use containers like some logical separation for the data. – CognitiveComplexity Jan 27 '21 at 15:39
  • 1
    That's unfortunate because putting different entities in different containers is an anti-pattern for Cosmos. Data should be stored in the same container based upon it's access pattern, not its schema. That said, I don't know why you think your approach of an array of Tasks is bad. I can't think of any other way to do it. – Mark Brown Jan 27 '21 at 17:07
  • Thank you for your response @MarkBrown. I think it's bad because I don't know how to handle it when our database will grow to 200-1000 containers – CognitiveComplexity Jan 27 '21 at 17:12
  • @MarkBrown also, this approach is bad because I can't remove tasks that don't need anymore, in case if a collection will be removed – CognitiveComplexity Jan 27 '21 at 17:20
  • 1
    Why not use another array like a dictionary and use the container name as it's key. Agree this is not ideal. If you can, I would look to migrate this such that your logical boundary is a partition key rather than a container. This can get very expensive with so many containers. – Mark Brown Jan 27 '21 at 17:29
  • @MarkBrown approach with an array is good, but we also have another CosmosDB where I need to write data that comes from change feed. The problem is that we also have many containers in the second CosmosDB. Names and count of containers are absolutely the same as in the first CosmosDB. It's something like replication of the first database but with modified data. What I can't figure out in this approach is how I can find where (in what container from the second database) I need to write changes from the change feed. – CognitiveComplexity Jan 28 '21 at 09:34
  • So, if it's still not clear, let me give you an example. I have a new message comes from Change Feed and I start processing it inside my delegate. How can I understand from what container this message has come inside the delegate method? – CognitiveComplexity Jan 28 '21 at 09:34
  • You'll know because change feed processor requires the name of the container to monitor when you create a new instance of it. – Mark Brown Jan 28 '21 at 15:56

0 Answers0