0

I am trying to write a custom .NET activity, which will be run from Azure Data Factory. It will do two tasks, one after the other:

  1. it will download grib2 files from an FTP server daily (grib2 is a custom compression for meteorological data)
  2. it will decompress each file as it is downloaded.

So far I have setup an Azure Batch with a pool with two nodes - Windows Server machines, which are used to run the FTP downloads. The nodes are downloading the grib2 files to a blob storage container.

The code for the custom app so far looks like this:

using System;
using System.Collections.Generic;
using System.Linq;

using Microsoft.Azure;
using Microsoft.Azure.Management.DataFactories.Models;
using Microsoft.Azure.Management.DataFactories.Runtime;

namespace ClassLibrary1
{
    public class Class1 : IDotNetActivity
    {
        public IDictionary string, string Execute(
                IEnumerable linkedServices,
                IEnumerable datasets,
                Activity activity,
                IActivityLogger logger)
        {
            logger.Write("Start");

            //Get extended properties
            DotNetActivity dotNetActivityPipeline = (DotNetActivity)activity.TypeProperties;

            string sliceStartString = dotNetActivityPipeline.ExtendedProperties["SliceStart"];

            //Get linked service details
            Dataset inputDataset = datasets.Single(dataset = dataset.Name == activity.Inputs.Single().Name);
            Dataset outputDataset = datasets.Single(dataset = dataset.Name == activity.Outputs.Single().Name);

            /*
                DO FTP download here
            */

            logger.Write("End");

            return new Dictionary string, string();
        }
    }
} 

So far my code works and I have the files downloaded to my blob storage account. Now that I have the files downloaded, I would like to have the nodes of the Batch pool decompress the files and put the decompressed files in my blob storage for further processing. For this, wgrib2.exe is used, which comes with some dll files. I have already zipped and uploaded the executable and all dll files it needs to the Application Packages to my pool. If I am correct, when each node joins the pool, this executable will be extracted and available for calls.

My question is: how do I go about to write the custom .NET activity so the files are downloaded by the nodes of my pool and after each file is downloaded, a decompression command is run on each file to convert it to csv file? The command line for this would look like:

wgrib2.exe downloadedfileName.grb2 -csv downloadedfileName.csv 

How do I get a handle of the name of each downloaded file, how do I precess it on the node and save it back to the blob storage?

Also, how can I control how many files are downloaded at the same time and how many are decompressed at the same time?

FeodorG
  • 178
  • 2
  • 10
  • How to use Azure Batch to unzip large number of files, please refer to this [blog](http://www.johndehavilland.com/blog/2016/04/23/Using-Azure-Batch-to-unzip-large-number-of-files.html). – Tom Sun - MSFT Aug 07 '17 at 08:37
  • This is a great blog post, however it relies on the fact that I can write my own code for the executable. In my case I need to run a 3rd party executable with its parameters. I already downloaded the solution from the blog you mentioned, and managed to upload the wgrib2.exe and dll files to the node. However, i do not know how to download the processed files from the node back to my blob storage. Any ideas? – FeodorG Aug 09 '17 at 10:55
  • I am also a .NET developer but I used python to handle grb2(.bz2) files. I am using a docker container to host the code and I run the container in an Azure Function. – Jorn.Beyers May 05 '21 at 10:01

0 Answers0