I am trying to write a custom .NET activity, which will be run from Azure Data Factory. It will do two tasks, one after the other:
- it will download grib2 files from an FTP server daily (grib2 is a custom compression for meteorological data)
- it will decompress each file as it is downloaded.
So far I have setup an Azure Batch with a pool with two nodes - Windows Server machines, which are used to run the FTP downloads. The nodes are downloading the grib2 files to a blob storage container.
The code for the custom app so far looks like this:
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.Azure;
using Microsoft.Azure.Management.DataFactories.Models;
using Microsoft.Azure.Management.DataFactories.Runtime;
namespace ClassLibrary1
{
public class Class1 : IDotNetActivity
{
public IDictionary string, string Execute(
IEnumerable linkedServices,
IEnumerable datasets,
Activity activity,
IActivityLogger logger)
{
logger.Write("Start");
//Get extended properties
DotNetActivity dotNetActivityPipeline = (DotNetActivity)activity.TypeProperties;
string sliceStartString = dotNetActivityPipeline.ExtendedProperties["SliceStart"];
//Get linked service details
Dataset inputDataset = datasets.Single(dataset = dataset.Name == activity.Inputs.Single().Name);
Dataset outputDataset = datasets.Single(dataset = dataset.Name == activity.Outputs.Single().Name);
/*
DO FTP download here
*/
logger.Write("End");
return new Dictionary string, string();
}
}
}
So far my code works and I have the files downloaded to my blob storage account. Now that I have the files downloaded, I would like to have the nodes of the Batch pool decompress the files and put the decompressed files in my blob storage for further processing. For this, wgrib2.exe is used, which comes with some dll files. I have already zipped and uploaded the executable and all dll files it needs to the Application Packages to my pool. If I am correct, when each node joins the pool, this executable will be extracted and available for calls.
My question is: how do I go about to write the custom .NET activity so the files are downloaded by the nodes of my pool and after each file is downloaded, a decompression command is run on each file to convert it to csv file? The command line for this would look like:
wgrib2.exe downloadedfileName.grb2 -csv downloadedfileName.csv
How do I get a handle of the name of each downloaded file, how do I precess it on the node and save it back to the blob storage?
Also, how can I control how many files are downloaded at the same time and how many are decompressed at the same time?