2

I want two different containers in my Cosmos DB database to contain the exact same data all the time. The only difference will be the partition key of the container.

What is the easiest way to accomplish this? I am looking for a method with none to little code so I do not like the Data Factory solution that the internet seems to recommend. Perhaps there is another service in Azure or a third party service - or maybe it can be done robustly with simple triggers?

EDIT: Clarification - I need them to be continuously updated. One container is where all the data is changed during normal use and the other container should be kept synchronized as it happens.

Niels Brinch
  • 3,033
  • 9
  • 48
  • 75
  • Please have a look at azcopy: https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10 – Markus Meyer Oct 14 '20 at 11:03
  • @MarkusMeyer at first glance there seems to be two issues with that. 1: It does not support Cosmos DB, but maybe I am missing something. 2: It is more for single operations and not to continuously keep two containers synchronized as one of them is changed. – Niels Brinch Oct 14 '20 at 11:07
  • It supports DocumentDB. this is the former name of CosmosDB – Markus Meyer Oct 14 '20 at 11:16
  • @MarkusMeyer The examples I can find using AzCopy are about migrations and not about keeping two containers synchronized. Are you sure it is suitable for that purpose and if so, could you provide some guidance on where to start? – Niels Brinch Oct 14 '20 at 11:40
  • 1
    OK. I solved this for myself with an Azure Function. There's a trigger on collection A and the Function stores the document in Collection B – Markus Meyer Oct 14 '20 at 11:45
  • I might have to do that, but it’s something that requires code and that code can fail, so I was hoping for something built in and simple/robust – Niels Brinch Oct 14 '20 at 13:40

2 Answers2

3

This is the function code for it:

using System;
using System.Collections.Generic;
using Microsoft.Azure.Documents;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Host;
using Microsoft.Extensions.Logging;

namespace CosmosDBSyncFunction
{
    public static class SyncCosmosDb
    {
        [FunctionName(nameof(Sync))]
        public static void Sync(
            [CosmosDBTrigger(
                databaseName: "evaluation",
                collectionName: "lorem",
                ConnectionStringSetting = "cosmos-mm-eval",
                LeaseCollectionName = "leases",
                CreateLeaseCollectionIfNotExists=true
            )]IReadOnlyList<Document> input,
            [CosmosDB(
                databaseName: "evaluation",
                collectionName: "ipsum",
                ConnectionStringSetting = "cosmos-mm-eval")] IAsyncCollector< Document> output,
            ILogger log)

        {

            foreach(var item in input)
            {
                output.AddAsync(item);
            }
        }
    }
}

and the settings which has to be configured local.settings.json:

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "UseDevelopmentStorage=true",
    "AzureWebJobsDashboard": "UseDevelopmentStorage=true",
    "cosmos-mm-eval": "secret",
    "FUNCTIONS_WORKER_RUNTIME": "dotnet"
  }
}
Markus Meyer
  • 3,327
  • 10
  • 22
  • 35
  • 1
    Looks like there is not a lot of code that could fail. I worry a bit about intermittent errors like a briefly lost connection and then whether some changes might be lost. Do you have some idea about how likely that is? – Niels Brinch Oct 14 '20 at 21:33
  • 1
    In general, there's no issue with connections. Azure Functions and CosmosDB have a very high availability. But it also depends on the connection complexity by using VNETs, etc. Of course, you have to use App Insights for logging which will also log any issues. – Markus Meyer Oct 15 '20 at 03:41
  • Yes, and then I have to write some custom sync code on the side to fix issues that occurred. But it sounds like this approach is indeed "the easiest" although I had hoped for easier. I'll keep the question open a bit longer, holding out. – Niels Brinch Oct 15 '20 at 07:12
2

If you don't want to write your own code to do this you can use this repo which has the Cosmos DB Live Data Migrator.

This can be deployed via Azure deploy button. Once deployed you can open the website and enter all of the required information. Then click another button and it will keep the two containers in sync with different partition keys.

Mark Brown
  • 8,113
  • 2
  • 17
  • 21
  • Thank you very much for that. I actually need this because I will implement according to the principles you showed at Microsoft in Copenhagen a while back. – Niels Brinch Oct 14 '20 at 21:10
  • Unfortunately it doesn't seem so good for my purpose because it only supports keeping two containers in sync and that requires two app services. So I will have to deploy another two app services for each time I want to connect two containers. In this first case, I will take two containers and sync them to one container (with a new partition key) so that is already 4 app services, which is a bit much. – Niels Brinch Oct 14 '20 at 21:26
  • Then for your scenario the easiest thing is to just use change feed processor to read from the source container and create separate CosmosContainer instances to each container you want to keep in sync. Then host it in an Azure Function or whatever compute you want. This is pretty easy to do. – Mark Brown Oct 14 '20 at 22:16
  • Yes, I think I will end up doing that. Thanks for the sparring. I had hoped for something hidden and failsafe, like the "replicate data globally" feature. Ironically it cannot replicate data non-globally :) – Niels Brinch Oct 15 '20 at 07:47
  • huh? Cosmos has replication built in. You can replicate data to any region in Azure. If that's what you are looking for go to the Azure Portal and find "replicate data globally" – Mark Brown Oct 15 '20 at 14:57
  • Yes I know. I want to replicate from one local container to another. I was only pointing out the irony in there being a flawless solution for global replication built-in but none for local replication. – Niels Brinch Oct 15 '20 at 17:49
  • 1
    Ah I see. Yes, we are actually working on making that super easy for people right now but won't be out until end of year, early next year. thx. – Mark Brown Oct 15 '20 at 21:05