7

I'd like to set up an Azure Data Factory pipeline which performs a move (i.e. copy, verify, delete) operation rather than just a copy operation between Blob Storage and a Data Lake Store. I cannot seem to find any detail on how to do this.

Sam
  • 71
  • 1
  • 1
  • 2

3 Answers3

2

Azure Data Factory does not have a built-in activity or option to Move files as opposed to Copy them. You can however do this with a Custom Activity.

This example on github shows how to do this with Azure Blob:

...
blob.DeleteIfExists();
...

https://github.com/Azure/Azure-DataFactory/tree/master/Samples/DeleteBlobFileFolderCustomActivity

If you feel this is an important feature, please add a feedback request:

https://feedback.azure.com/forums/270578-data-factory

A Delete activity has been added recently:

https://azure.microsoft.com/en-us/blog/clean-up-files-by-built-in-delete-activity-in-azure-data-factory/

wBob
  • 13,710
  • 3
  • 20
  • 37
  • 1
    Rather painful that this requires me to build a DLL...I'll give it a go though, thank you. – Sam Jan 20 '17 at 01:29
  • 1
    Consider making the feature request and post a link here. I'll vote for it, plus others landing at the page might too. Also if you get some code working, feel free to post it and mark as an answer yourself; I think that would be genuinely useful. – wBob Jan 21 '17 at 10:41
2

Just to add a contemporary update for anyone coming across this.

Data Factory V2 has relatively released a dedicated Delete Activity

At the time of writing this supports:

  • Azure Blob storage
  • Azure Data Lake Storage Gen1
  • Azure Data Lake Storage Gen2
  • File System
  • FTP
  • SFTP
  • Amazon S3
{
    "name": "DeleteActivity",
    "type": "Delete",
    "typeProperties": {
        "dataset": {
            "referenceName": "<dataset name>",
            "type": "DatasetReference"
        },
        "recursive": true/false,
        "maxConcurrentConnections": <number>,
        "enableLogging": true/false,
        "logStorageSettings": {
            "linkedServiceName": {
                "referenceName": "<name of linked service>",
                "type": "LinkedServiceReference"
            },
            "path": "<path to save log file>"
        }
    }
}

Taken from: https://learn.microsoft.com/en-gb/azure/data-factory/delete-activity

Alex KeySmith
  • 16,657
  • 11
  • 74
  • 152
  • In 2020, this should be the right answer. Microsoft also released a template for moving files at the end of 2019. I suggest adding it to your answer: https://learn.microsoft.com/en-us/azure/data-factory/solution-template-move-files – Original BBQ Sauce Feb 18 '20 at 15:10
0

From the product team on ADF here. While we're working on "Delete" as a first class activity in ADF, we have published a sample in Github in how users can delete files (in this case, Azure Blob) once they've been copied using ADF copy activity.

https://github.com/Azure/Azure-DataFactory/tree/master/Samples/DeleteBlobFileFolderCustomActivity

This is possible using the ADF custom .Net activity. The sample showcases the following:

  • a C# file which can be used as part of ADF custom .net activity to delete particular blobs or an entire folder.
  • Users need to provide a list of Azure Blob datasets to be deleted as a comma separated list in the 'inputToDelete' extended property in the pipeline json. The custom .Net activity will retrieve the dataset FolderPath and filename property. In case FolderPath is only specified, it will delete all the contents of the blob folder.

Contents of the Github repo:

  • DeleteFromBlobActivity.cs - C# file to be used as part of ADF Custom .Net activity to delete blob folders
  • PipelineSample.json - Showcases how to invoke the ADF Custom .Net delete blob activity. Replace placeholders corresponding to datasets names, schedule and linked services in the sample pipeline json.
Sharon Lo
  • 39
  • 2