1

--UPDATE--Issue sorted out following the link in the comments from another post

Im new to ADF and even though I have created some simple pipelines before, this one is proving very tricky..

I have a Fileshare with files and pictures from jobs with the naming convention: [A or B for before/after]-[Site in numbers]-[Work number].[jpeg or jpeg]

I want to select only the pictures from the file share and copy them to my blob storage and I want to create a folder dynamically in the blob, for example taking the [work number] of the picture name, creating a folder with this number, and saving in that folder any pictures with that same work number.

I have successfully connected to my file share and blob and I have successfully created my datasets as binary and moved pictures across by typing the path and the file name in the copy activity, so the connectivity is there.

The issue is that there are roughly 1 million pics and I want to automate this process with wildcards, but Im having a hard time with the dynamic expressions in ADF... any help with extracting and manipulating the name of each picture to achieve something like this would be appreciated!

--UPDATE WITH IMAGES AND CLARIFICATION--

Im trying to dynamically create and fill folders with a pipeline. My data set is a list of pictures with the numbering system:

[A or B for before/after]-[Site in numbers]-[Work number].[jpeg]

I created a working pipeline like this, getting the metadata of the source folder pic1

For each filename using the childItems argument from the GetMetadata activity, I create a ForEach activity pic2

I created two variables in the pipeline to set the folder name and change the order of the information in the filename. Then item().name is the iterative item of the ForEach activity pic3 pic4

Up until this point everything is working great. The issue is that the copy activity is over writing every newly created folder and file until Im left with one folder and file.

pic5

pic6

As seen on the picture below the data is being successfully copied, just overwritten. I will have 4-8 pictures per work number, so ideally there should be several folders of different work numbers and inside each folder, pictures with the images associated. Any help on how to avoid this overwriting issue is greatly appreciated

enter image description here

Joshbg
  • 13
  • 3
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Sep 07 '21 at 19:20

1 Answers1

0

Use getmetadata activity and get all your file names as child Items and store that in variable where you can use that variable in sink dataset path.

The Get Metadata activity may be used to get metadata for any data in Azure Data Factory. The metadata from the Get Metadata action can be used in conditional expressions to conduct validation or consumed in future activities.

Get Metadata activity in Azure Data Factory | Docs

split(split('a-2344-456.jpg','.')[1],'-')[3]

Where you can you the above Dynamic Expression to get worknumber and then use that variable in sink dataset path.

IpsitaDash-MT
  • 1,326
  • 1
  • 3
  • 7
  • I will try this approach and update you! Current issue is the get metadata activity... Is reaching maximum output and crashing the pipeline, so I will need to partition this into several runs – Joshbg Sep 08 '21 at 03:32
  • thank you for the update and let me know, happy to help :) – IpsitaDash-MT Sep 08 '21 at 03:48
  • 1
    Updated the question with more details and images. Your process should be working but is over writing the results of the pipeline, Im sure Im missing something very simple... – Joshbg Sep 09 '21 at 04:11
  • 1
    Found this link and sorted the issue, the OP ignored the warning on the DS set ups https://stackoverflow.com/questions/66680167/copy-files-of-different-formats-in-different-folders-based-using-azure-data-fact – Joshbg Sep 09 '21 at 06:44