0

Example

sameer/student/land/compressed files sameer/student/pro/uncompressed files

sameer/employee/land/compressed files sameer/employee/pro/uncompressed files

In the above example I need to read files from all LAND folders present in different sub directories and need to process them and place them in PRO folders with in same sub folders.

For this I have taken two GCS nodes one from source and another from sink.

in the GCS source i have provided path gs://sameer/ , it is reading files from all sub folders and merging them into one file placing it in sink path.


Excepted output all files should be placed in sub directories where i have fetched from.

It can achieve the excepted output by running pipeline separately for each folder

I am expecting is this can be possible by a single pipeline run

2 Answers2

0

It seems like your use case is simply moving files. In that case, I would suggest using the Action plugin GCS Move or GCS Copy.

Edwin Elia
  • 399
  • 3
  • 5
0

It seems like the task you are trying to carry out is not possible to do in one single Data Fusion pipeline, at least at the time of writing this.

In a pipeline, all the sources and sinks have to be connected. Otherwise you will get the following error:

'Invalid DAG. There is an island made up of stages ...'

This means it is not possible to parallelise several uncompression tasks, one for each folder of files, inside the same pipeline.

At the same time, if you were to use something like the following schema, the outputs would be aggregated and replicated over all of the sinks:

Multiple Sources and Sinks

Finally, I would say that the only case in which you can parallelise a task between several sources and several links is when using multiple database tables. By means of the following plug-ins (2) and (3) you can process data from multiple table inputs and export the output to multiple tables. If you would like to see all available plugins for Data fusion, please check the following link (4).

rodvictor
  • 323
  • 1
  • 10