1

I'm trying to archive old data from CosmosDB into Azure Tables but I'm very new to Azure Data Factory and I'm not sure what would be a good approach to do this. At first, I thought that this could be done with a Copy Activity but because the properties from my documents stored in the CosmosDB source vary, I'm getting mapping issues. Any idea on what would be a good approach to tackle this archiving process?

Basically, the way I want to store the data is to copy the document root properties as they are, and store the nested JSON as a serialized string.

For example, if I wanted to archive these 2 documents :

[
  {
    "identifier": "1st Guid here",
    "Contact": {
      "Name":  "John Doe",
      "Age": 99
    }
  },
  { 
    "identifier": "2nd Guid here",
    "Distributor": {
       "Name": "Jane Doe",
       "Phone": {
         "Number": "12345",
         "IsVerified": true
       }
    }
  }
]

I'd like these documents to be stored in Azure Table like this:

identifier      | Contact                                   | Distributor 
"Ist Guid here" | "{ \"Name\": \"John Doe\", \"Age\": 99 }" | null
"2nd Guid here" |  null                                     | "{\"Name\":\"Jane Doe\",\"Phone\":{\"Number\":\"12345\",\"IsVerified\":true}}"

Is this possible with the Copy Activity?

I tried using the mapping tab inside the CopyActivity, but when I try to run it I get an error saying that the dataType for one of the Nested JSON columns that are not present in the first row cannot be inferred.

Arreis
  • 43
  • 7

1 Answers1

1

Please follow my configuration in Mapping Tag.

enter image description here

Test output with your sample data:

enter image description here

Jay Gong
  • 23,163
  • 2
  • 27
  • 32
  • 1
    Thankyou Jay Cong I'm going to try this in a moment, but I wonder what if the schema can drift as time passes? I mean, what if for example, we add a new property to one of the CosmosDb documents, would the new column be ignored? Is there any way to "translate" only specific columns like the ones used for the partition key and row key, and move all the rest with their original name? – Arreis Mar 24 '20 at 18:31
  • 1
    Just to let you know, that your solution worked for me, thank you very much @jaygong. Now I only have to figure out what would happen with the Schema drift. – Arreis Apr 02 '20 at 18:33