I am working on a backup strategy for Data Lake Store (DLS). My plan is to create two DLS accounts and copy data between them. I have evaluated several approaches to achieve this but none of them satisfies the requirement to preserve the POSIX ACLs (permissions in DLS parlance). PowerShell cmdlets require data to be downloaded from the primary DLS onto a VM and re-uploaded onto the secondary DLS. The AdlCopy tool works only on Windows 10, does not preserve permissions and neither supports copying data across regions (not that this is a hard requirement). Data Factory seemed like the most sensible approach until I realized it also doesn't preserve permissions. Which leads me to my last option - Distcp. According to the Distcp guide (https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html), the tool supports preserving of permissions. However, the downside of using Distcp is that the tool must be run from HDInsight. Although it supports both intra and inter-cluster copying, I would rather not have a running HDInsight cluster just for backup operations. Am I missing something? Does anyone have any better suggestions?
Asked
Active
Viewed 1,471 times
1 Answers
1
Your assessment is comprehensive. Those are indeed the options that are available should you want to copy over permissions. So you will have to choose one of them, sorry. If you truly want a serverless option that would copy over the permissions, Azure Data Factory would have to be it. Could you please create a feedback item here - https://feedback.azure.com/forums/270578-data-factory?
Thanks, Sachin Sheth Program Manager, Azure Data Lake.

Sachin Sheth
- 309
- 1
- 3
-
Hello Sachin. According to https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-migration-cross-region, Data Factory also doesn't copy ACLs: 'Keep in mind that Data Factory copies only the folder hierarchy and content of the files. You need to manually apply any access control lists (ACLs) that you use in the old account to the new account.' Are you advising me to open a feature request for preserving of ACLs in Data Factory? – MrG Apr 04 '18 at 07:17
-
Hi Georgi, Yes, I am aware that Azure Data Factory does *not* copy ACLs. Hence requesting that you please open a feature request for it. That way like-minded people in the community can also upvote the request and will bump it up in our prioritization. Sorry for the inconvenience. – Sachin Sheth Apr 04 '18 at 18:03