0

I am trying to transfer multiple csv files from the Azure storage container to the GCP bucket through the Data fusion pipeline.

I can successfully able to transfer a single file by mentioning the below path (full path for specific CSV file) for the 'path' field for Azure blob storage configuration wasbs://containername@storageaccountname.blob.core.windows.net/CSVFile.csv

but, when I am trying to transfer multiple files from 'contrainername' container, below 'path' not working (here I didn't mention any file name after '/' as I need to transfer all the files under this container)

'wasbs://containername@storageaccountname.blob.core.windows.net/'

It is throwing exception as 'Spark program 'phase-1' failed with error: / is not found. Please check the system logs for more details'

Here I am using 'SAS Token' for authentication which is generated at the container level which is working perfectly for the full path file.

Is there any Forloop option in GCP pipeline to iterate through the files? Thanks in advance


Edit: SAS Token generated at container level not working for the path'/'. SAS Token generated at Storage account level can able to pick all the files under that directory. But it is combining all the files data into a single file(generated at GC storage)

Anyone can help on how to transfer files separately? Thanks

Srini V
  • 65
  • 1
  • 1
  • 8
  • Srini V: From the error message it is hard to identify what the root cause of failure for the pipeline. Could you check the pipeline logs and copy/paste the entire stack trace? That will be helpful for debugging the issue. – Ajai Apr 19 '21 at 23:18
  • @AjaiI have checked the logs trace and noticed that it is working for 1 file. For multiple files, as I skip the file name after '/', it is not working. Might I need to use for each loop to iterate through files under that directory. Could you share your thoughts plz? – Srini V Apr 20 '21 at 00:05
  • Can you shared what you are seeing in logs? I am not sure what is the failure when the pipeline runs against directory. The documentation for the plugin https://github.com/data-integrations/azure/blob/develop/azure-blob-store/docs/AzureBlobStore-batchsource.md#properties mentions about using a glob to read files under a directory. Would be easier to debug if you could provide the logs you are seeing when you run the pipeline – Ajai Apr 20 '21 at 03:41
  • @Ajail below is the trace log: 04/20/2021 14:56:35 INFO Pipeline 'be3e2e8c-a194-11eb-95d3-aa0f99b0091d' is started by user 'root' with arguments {logical.start.time=1618894592172, system.profile.name=SYSTEM:dataproc} 04/20/2021 14:56:35 INFO Pipeline 'be3e2e8c-a194-11eb-95d3-aa0f99b0091d' running 04/20/2021 14:56:54 ERROR Spark program 'phase-1' failed with error: / is not found. Please check the system logs for more details. 04/20/2021 14:56:54 ERROR Pipeline 'be3e2e8c-a194-11eb-95d3-aa0f99b0091d' failed. – Srini V Apr 20 '21 at 04:58
  • Srini - I understand you had already provided this error in the original question. The log ends with "check the system logs for more details".. Can you check the appfabric logs to see the corresponding stack trace? This should give us more information on where the failure happens. In the mean time I will try to repro it locally with a custom azure setup – Ajai Apr 20 '21 at 06:08
  • @Ajail - Detailed log trace is huge and I couldn't able to paste over here as comments section allows less than 600 characters. Is there any alternate way to share to you? thanks – Srini V Apr 20 '21 at 21:25
  • Srini V: Can you share the logs in pastebin? – Ajai Apr 21 '21 at 03:56
  • @Ajai please find the link https://pastebin.pl/view/81fabcf9 – Srini V Apr 21 '21 at 05:14
  • Srini V: I just noticed your Edit. Is your goal to read all the files under root dir in azure blob storage and transfer them as individual files to GCS? – Ajai Apr 22 '21 at 21:21
  • Ajai : Yes. My requirement is to transfer all the files (csv files) under directory and transfer to GCP storage as .txt files. But my current Data fusion combining all the files and generating single .txt files. Could you help on how to transfer individual files? thanks – Srini V Apr 23 '21 at 06:04
  • @SriniV: Could you please confirm that the requirement to transfer files separately from Azure to GCS has been met by using Data Transfer instead of Datafusion as mentioned in the link below? https://stackoverflow.com/a/62349318/15831977 – Krish May 21 '21 at 09:34
  • @KrishanuSengupta: Yes, but through data Transfer, we are unable to change the xtension of the file(from .csv to .txt). So, I wrote a cloud function on buket, which will trigger when ever the file transferred (through data transfer), it will pick that file and changes the extension. Overall, I have achieved my requirement through 2 step process (Data transfer + Cloud function). – Srini V May 24 '21 at 12:16

0 Answers0