1

I am trying to copy data from SQL Table in a on-prem sql server and upload it to a Document DB using custom activity in Azure data factory pipeline. Can anyone tell me how can I accomplish it using IDotNetActivity or any other interface or class.

Maverik
  • 411
  • 4
  • 22
  • This [article](https://azure.microsoft.com/en-gb/documentation/articles/data-factory-azure-documentdb-connector/) details copying from blob storage to DocumentDB. See if you can get that working then adapt for on-prem datasource. – wBob Oct 06 '16 at 15:57
  • Thank you for your reply, but I am trying to do the copy using custom activity. – Maverik Oct 07 '16 at 04:31

4 Answers4

2

Actually, Custom activity cannot access on-prem data today.

Similar question here: On-Prem SQL connection throwing SqlException in Datafactory custom activity

The solution is copy on-prem data to cloud. Then run custom activity against cloud storage. wBob shared good sample above.

If you have to complete it in one activity, you can setup vNet and ExpressRoute to connect your Azure public cloud with your onprem environment.

Community
  • 1
  • 1
Charles Gu
  • 59
  • 2
1

I got this to work with conventional Azure Data Factory (ADF) tasks. No custom task is required. I wouldn't make things more complicated than they need to be particularly with these components which can be hard to debug.

The following sample shows:

  1. A linked service of type OnPremisesSqlServer.
  2. A linked service of type DocumentDb.
  3. An input dataset of type SQLServerDataset.
  4. An output dataset of type DocumentDbCollection.
  5. The pipeline with Copy activity that uses SqlSource and DocumentDbCollectionSink.

Linked Service of type On Premises SQL Server:

{
    "name": "OnPremLinkedService",
    "properties": {
        "type": "OnPremisesSqlServer",
        "description": "",
        "typeProperties": {
            "connectionString": "Data Source=<servername - required for credential encryption>;Initial Catalog=<databasename - required for credential encryption>;Integrated Security=False;User ID=<username>;Password=<password>;",
            "gatewayName": "<Name of the gateway that the Data Factory service should use to connect to the on-premises SQL Server database - required for credential encryption>",
            "userName": "<Specify user name if you are using Windows Authentication>",
            "password": "<Specify password for the user account>"
        }
    }
}

Linked Service of type DocumentDB:

{
    "name": "DocumentDbLinkedService",
    "properties": {
        "type": "DocumentDb",
        "typeProperties": {
            "connectionString": "AccountEndpoint=<EndpointUrl>;AccountKey=<AccessKey>;Database=<Database>"
        }
    }
}

Input Dataset of type SqlServerTable:

{
    "name": "SQLServerDataset",
    "properties": {
        "structure": [
            {
                "name": "Id",
                "type": "Int32"
            },
            {
                "name": "FirstName",
                "type": "String"
            },
            {
                "name": "MiddleName",
                "type": "String"
            },
            {
                "name": "LastName",
                "type": "String"
            }
        ],
        "published": false,
        "type": "SqlServerTable",
        "linkedServiceName": "OnPremLinkedService",
        "typeProperties": {
            "tableName": "dbo.Users"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        },
        "external": true,
        "policy": {}
    }
}

Output Dataset of type DocumentDbCollection:

{
    "name": "PersonDocumentDbTableOut",
    "properties": {
        "structure": [
            {
                "name": "Id",
                "type": "Int32"
            },
            {
                "name": "Name.First",
                "type": "String"
            },
            {
                "name": "Name.Middle",
                "type": "String"
            },
            {
                "name": "Name.Last",
                "type": "String"
            }
        ],
        "published": false,
        "type": "DocumentDbCollection",
        "linkedServiceName": "DocumentDbLinkedService",
        "typeProperties": {
            "collectionName": "Person"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        }
    }
}

Pipeline with Copy activity using SqlSource and DocumentDbCollectionSink:

{
    "name": "PipelineTemplate 3",
    "properties": {
        "description": "On prem to DocDb test",
        "activities": [
            {
                "type": "Copy",
                "typeProperties": {
                    "source": {
                        "type": "SqlSource"
                    },
                    "sink": {
                        "type": "DocumentDbCollectionSink",
                        "writeBatchSize": 2,
                        "writeBatchTimeout": "00:00:00"
                    },
                    "translator": {
                        "type": "TabularTranslator",
                        "columnMappings": "id: id, FirstName: Name.First, MiddleName: Name.Middle, LastName: Name.Last"
                    }
                },
                "inputs": [
                    {
                        "name": "SQLServerDataset"
                    }
                ],
                "outputs": [
                    {
                        "name": "PersonDocumentDbTableOut"
                    }
                ],
                "policy": {
                    "timeout": "1.00:00:00",
                    "concurrency": 1,
                    "retry": 3
                },
                "scheduler": {
                    "frequency": "Day",
                    "interval": 1
                },
                "name": "CopyActivityTemplate"
            }
        ],
        "start": "2016-10-05T00:00:00Z",
        "end": "2016-10-05T00:00:00Z",
        "isPaused": false,
        "hubName": "adfdocdb2_hub",
        "pipelineMode": "Scheduled"
    }
}
wBob
  • 13,710
  • 3
  • 20
  • 37
  • I was able to get it working with ADF configuration as mentioned by wBob. However, the requirement is to do it using custom activity. – Maverik Oct 10 '16 at 09:19
  • That's interesting. Can you explain why you must use custom activity? From experience these can be really difficult to debug when things go wrong so doing it with the built-in tasks is normally preferable. Interested to hear more about your use case. Thanks. – wBob Oct 10 '16 at 09:27
  • 1
    wBob you are right about difficulty in debugging. Currently the project is in nascent stage. Will update here as for why we need custom activity and the business need in later stage of the project. Thanks for your quick response. – Maverik Oct 10 '16 at 09:33
1

I was able to solve the problem. The solution is to write the code in custom activity itself that copies data from On-Prem SQL Server to DocumentDB by using the below code:

 public async Task CopyDataFromTo(string source)
    {
        try
        {
            DataTable dtSource = new DataTable();
            string EndpointUrl = "https://yourendpoint.documents.azure.com:443/";
            string AuthorizationKey = "*****";
            SecureString authKey = new SecureString();
            foreach(char c in AuthorizationKey.ToCharArray())
            {
                authKey.AppendChar(c);
            }
            SqlDataAdapter adapSource = new SqlDataAdapter("Select * From YourTable", source);
            adapSource.Fill(dtSource);
            foreach (DataRow Dr in dtSource.Rows)
            {
                dynamic docFirst = new
                {
                    UserID = Int32.Parse(Dr["ColumnOne"].ToString()),
                    UserAlias = Dr["ColumnTwo"].ToString()
                };
                using (var client = new DocumentClient(new Uri(EndpointUrl), authKey))
                {
                    Document newDocument = await client.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri("DatabaseName", "CollectionName"), docFirst);
                };
            }
        }
        catch (Exception Ex)
        {
            throw Ex;
        }
    }
Maverik
  • 411
  • 4
  • 22
0

Thanks Charles. Turns out you are right. The solution I implemented was:

Part 1:

Implemented a data factory pipeline to move data from on-prem databases to staging DocumentDB collections.

Part 2:

Used custom activity to combine data from different collections(staged) in documentdb to create a new documentdb collection with required output data.

Maverik
  • 411
  • 4
  • 22