2

I am trying to add documents in CosmosDb that has a nested array. I am using the Copy Activity.

Sample Document:

{
  "itemNumber": "D10001",      
  "readings" : [
                  { "value": 25, "ets":"100011111"},
                  { "value": 35, "ets":"100011122"}
               ]
}

In the source dataset I formatted the readings array as a string in my SQL query, and set the data type in the sink dataset as an Object. The data is copied, but the readings are stringified.

Is there a means to configure the Copy Activity to handle this array?

Mike Henderson
  • 1,305
  • 15
  • 27

2 Answers2

2

As I know, no such properties could help you convert string data to array format in adf cosmos db configuration.

Since you are using adf to import data so that you can't use PreTrigger to change the format of created documents.PreTrigger need to be invoked by code or rest api.

So, as workaround, I suggest you using Azure Function Cosmos DB Trigger to process every document when they imported into database. Please refer to my function code:

using System.Collections.Generic;
using Microsoft.Azure.Documents;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Host;
using Newtonsoft.Json.Linq;
using System;
using Microsoft.Azure.Documents.Client;

namespace TestADF
{
    public static class Function1
    {
        [FunctionName("Function1")]
        public static void Run([CosmosDBTrigger(
            databaseName: "db",
            collectionName: "item",
            ConnectionStringSetting = "documentdbstring",
            LeaseCollectionName = "leases")]IReadOnlyList<Document> input, TraceWriter log)
        {
            if (input != null && input.Count > 0)
            {
                log.Verbose("Start.........");
                String endpointUrl = "https://***.documents.azure.com:443/";
                String authorizationKey = "key";
                String databaseId = "db";
                String collectionId = "item";

                DocumentClient client = new DocumentClient(new Uri(endpointUrl), authorizationKey);

                for (int i = 0; i < input.Count; i++)
                {
                    Document doc = input[i];
                    if ((doc.GetPropertyValue<Boolean>("alreadyForamt") == null) || (!doc.GetPropertyValue<Boolean>("alreadyForamt")))
                    {                       
                        String readings = doc.GetPropertyValue<String>("readings");
                        JArray r = JArray.Parse(readings);

                        doc.SetPropertyValue("readings", r);

                        client.ReplaceDocumentAsync(UriFactory.CreateDocumentUri(databaseId, collectionId, doc.Id), doc);

                        log.Verbose("Update document Id " + doc.Id);
                    }

                }
            }
        }
    }
}

Hope it helps you.

Jay Gong
  • 23,163
  • 2
  • 27
  • 32
  • This looks promising. The gotcha for me is the ADF pipeline will be loading data across several collections. Adding a CosmosDb trigger per collection may involve a good deal of administrative overhead. – Mike Henderson Jul 20 '18 at 13:04
2

What is your source? You could copy your data to json files first. And then import it to cosmos DB as is, which means don’t specify the format and structure in your source and sink dataset.

Fang Liu
  • 2,325
  • 2
  • 13
  • 18
  • The source is a SQL Server database. It's an older version, so some of the newer JSON related features are not available. I was hoping to avoid an intermediate step and transform in the pipeline, but that may not be possible. – Mike Henderson Jul 20 '18 at 12:46
  • is it possible for you to use a stored procedure to read the data? Adf v2 supports read from stored procedure. – Fang Liu Jul 20 '18 at 13:36
  • I am using stored procedure. Initial attempts involve formatting the nested array as a stringified JSON object. But so far I have been only able to get the property into CosmosDb as a string. – Mike Henderson Jul 20 '18 at 14:45
  • After trying a few different approaches, this answer ended up being the best for what I need to do. – Mike Henderson Jul 21 '18 at 13:00