Azure Data Factory v2 and data processing in custom activity

Question

I am migrating (extract-load) a large dataset to a LOB service, and would like to use Azure Data Factory v2 (ADF v2). This would be the cloud version of the same kind of orchestration typically implemented in SSIS. My source database and dataset, as well as the target platform are on Azure. That lead me to ADFv2 with Batch Service (ABS) and creating a custom activity.

https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-dotnet-custom-activity

However, I am unable to read from the documentation or samples provided by Microsoft how ADF v2 can create the job and tasks needed by the batch service.

As an example, lets say I have dataset with 10 million records, and batch service with 10 cores in a pool. How do I submit 1/10, or even row-for-row, to my command line app running on each of the cores in the pool? How do I distribute the work? Following the default guide at the docs for ADF v2, I just get a datasets.json file, and it is the same for all my pool nodes, no "slice" or subset information.

If ADF v2 was not involved I would create a job in ABS and for each row or for each X rows, create a task. The nodes would then execute task for task. How do I achieve something similar with ADF v2?

Azure Data Factory v2 and data processing in custom activity

0 Answers0