0

I am struggling a lot with this task. I have to download files from SFTP and then parse them. I am using Durable functions like this

        [FunctionName("MainOrch")]
    public async Task<List<string>> RunOrchestrator(
         [OrchestrationTrigger] IDurableOrchestrationContext context, ILogger log)
    {
        try
        {
            var filesDownloaded = new List<string>();
            var filesUploaded = new List<string>();
            var files = await context.CallActivityAsync<List<string>>("SFTPGetListOfFiles", null);
            log.LogInformation("!!!!FilesFound*******!!!!!" + files.Count);
            if (files.Count > 0)
            {
                foreach (var fileName in files)
                {
                    filesDownloaded.Add(await context.CallActivityAsync<string>("SFTPDownload", fileName));
                }
      

                var parsingTasks = new List<Task<string>>(filesDownloaded.Count);
                foreach (var downlaoded in filesDownloaded)
                {
                    var parsingTask = context.CallActivityAsync<string>("PBARParsing", downlaoded);
                    parsingTasks.Add(parsingTask);
                }
                await Task.WhenAll(parsingTasks);
            }
            return filesDownloaded;
        }
        catch (Exception ex)
        {

            throw;
        }

    }

SFTPGetListOfFiles: This functions connects to SFTP and gets the list of files in a folder and return.

SFTPDownload: This function is suppose to connect to SFTP and download each file in Azure Function's Tempt Storage. and return the download path. (each file is from 10 to 60 MB)

        [FunctionName("SFTPDownload")]
    public async Task<string> SFTPDownload([ActivityTrigger] string name, ILogger log, Microsoft.Azure.WebJobs.ExecutionContext context)
    {
        var downloadPath = "";
        try
        {
            using (var session = new Session())
            {
                try
                {
                    session.ExecutablePath = Path.Combine(context.FunctionAppDirectory, "winscp.exe");
                    session.Open(GetOptions(context));
                    log.LogInformation("!!!!!!!!!!!!!!Connected For Download!!!!!!!!!!!!!!!");

                    TransferOptions transferOptions = new TransferOptions();
                    transferOptions.TransferMode = TransferMode.Binary;
                    downloadPath = Path.Combine(Path.GetTempPath(), name);
                    log.LogInformation("Downloading " + name);
                    var transferResult = session.GetFiles("/Receive/" + name, downloadPath, false, transferOptions);
                    log.LogInformation("Downloaded " + name);
                    // Throw on any error
                    transferResult.Check();
                    log.LogInformation("!!!!!!!!!!!!!!Completed Download !!!!!!!!!!!!!!!!");
                }
                catch (Exception ex)
                {
                    log.LogError(ex.Message);
                }
                finally
                {
                    session.Close();
                }
            }
        }
        catch (Exception ex)
        {
            log.LogError(ex.Message);
            _traceService.TraceException(ex);
        }
        return downloadPath;
    }

PBARParsing: function has to get the stream of that file and process it (processing a 60 MB file might take few minutes on Scale up of S2 and Scale out with 10 instances.)

        [FunctionName("PBARParsing")]
    public async Task PBARParsing([ActivityTrigger] string pathOfFile,
    ILogger log)
    {

        var theSplit = pathOfFile.Split("\\");
        var name = theSplit[theSplit.Length - 1];
        try
        {
            log.LogInformation("**********Starting" + name);
            Stream stream = File.OpenRead(pathOfFile);

i want the download of all files to be completed using SFTPDownload thats why "await" is in a loop. and then i want parsing to run in parallel.

Question 1: Does the code in MainOrch function seems correct for doing these 3 things 1)getting the names of files, 2) downloading them one by one and not starting the parsing function until all files are downloaded. and then 3)parsing the files in parallel. ?

I observed that what i mentioned in Question 1 is working as expected.

Question 2: 30% of the files are parsed and for the 80% i see errors that "Could not find file 'D:\local\Temp\fileName'" is azure function removing the files after i place them ? is there any other approach i can take? If i change the path to "D:\home" i might see "File is being used by another process" error. but i haven't tried it yet. out the 68 files on SFTP weirdly last 20 ran and first 40 files were not found at that path and this is in sequence.

enter image description here

Question3: I also see this error " Singleton lock renewal failed for blob 'func-eres-integration-dev/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2020-08-08T17:57:10.494Z (46005 milliseconds ago) with a duration of 155 milliseconds. The lease period was 15000 milliseconds." does it tells something ? it came just once though.

update after using "D:\home" i am not getting file not found errors

Raas Masood
  • 1,475
  • 3
  • 23
  • 61

1 Answers1

1

For others coming across this, the temporary storage is local to an instance of the function app, which will be different when the function scales out.

For such scenarios, D:\home is a better alternative as Azure Files is mounted here, which is the same across all instances.

As for the lock renewal error observed here, this issue tracks it but shouldn't cause issues as mentioned. If you do see any issue because of this, it would be best to share details in that issue.

PramodValavala
  • 6,026
  • 1
  • 11
  • 30