1

I've just started working with Data Lake and I'm currently trying to figure out the real workflow steps and how to automatize the whole process. Say I have some files as an input and I would like to process them and download output files in order to push into my data warehouse or/and SSAS.

I've found absolutely lovely API and it's all good but I can't find a way to get all the file names in a directory to get them downloaded further.

Please correct my thoughts regarding workflow. Is there another, more elegant way to automatically get all the processed data (outputs) into a storage (like conventional SQL Server, SSAS, data warehouse and etc)?

If you have a working solution based on Data Lake, please describe the workflow (from "raw" files to reports for end-users) with a few words.

here is my example of NET Core application

using Microsoft.Azure.DataLake.Store;
using Microsoft.IdentityModel.Clients.ActiveDirectory;
using Microsoft.Rest.Azure.Authentication;

            var creds = new ClientCredential(ApplicationId, Secret);
            var clientCreds = ApplicationTokenProvider.LoginSilentAsync(Tenant, creds).GetAwaiter().GetResult();
            var client = AdlsClient.CreateClient("myfirstdatalakeservice.azuredatalakestore.net", clientCreds);
            var result = client.GetDirectoryEntry("/mynewfolder", UserGroupRepresentation.ObjectID);
Vladimir Semashkin
  • 1,270
  • 1
  • 10
  • 21

1 Answers1

1

Say I have some files as an input and I would like to process them and download output files in order to push into my data warehouse or/and SSAS.

If you want to download the files from the folder in the azure datalake to the local path, you could use the following code to do that.

client.BulkDownload("/mynewfolder", @"D:\Tom\xx"); //local path

But based on my understanding, you could use the azure datafactory to push your data from datalake store to azure storage blob or azure file storge.

Tom Sun - MSFT
  • 24,161
  • 3
  • 30
  • 47
  • Thanks! Good point regarding datafactory, I will defenetely have a look at its direction. To be fair, I would like to see some steps (maybe not trivial) to get data from DataLake to some storage out of Azure. Just wondering what existing solutions are.The thing is that I can't move entire system at once to cloud. So, I have to keep storage and other things out side for some time. – Vladimir Semashkin May 03 '18 at 09:45
  • Based on my knowledge it also supports to move the data from datalake to filesystem. – Tom Sun - MSFT May 04 '18 at 05:35