1

I am trying to move CSV files in SFTP folder to GCS using Data Fusion. But I am unable to do it and throwing below error:

Here are the properties of both FTP and GCS plugins. Surprisingly, I could see the data in PREVIEW mode in all the stages but when I try to deploy the pipeline it fails. I tried using CSVParser as well as a TRANSFORM in between source(FTP) and sink (GCS). Still it shows the same error. I am using FTP plugin in Hub with version 3.0.0. Please help me to solve it.

enter image description here

And the error is as below, when I try to deploy the pipeline, eventhough Preview Data I was able to see the data.

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
Pasha Shaik
  • 93
  • 1
  • 8
  • Can you confirm if your FTP plugin is the same as the one indicated by data fusion? ref: [ftp-plugins](https://github.com/data-integrations/ftp-plugins)? – Betjens Mar 02 '22 at 17:32
  • Also, what configurations are you using inside your plugins? can you also share it? – Betjens Mar 03 '22 at 11:51

2 Answers2

2

I solved this issue by changing the Pipeline execution engine from SPARK to MAPREDUCE in Data Fusion. Now it is working.

Pasha Shaik
  • 93
  • 1
  • 8
0

Well I have dig a lot on this, I found that this plugins have issues when running ftp-plugins, so at the moment you can't do much on it. Fortunately, there are workarounds for this. To name a few here are some:

  • You can use an old version ( Dataproc image to 1.5/1.3 ) as indicated on the public case that also makes reference to this issue. For more details about this case, you can check the link for the issue, SFTP Source fails when deployed (SftpExecption) but not in preview. Don't forget to upvote and leave a comment too.

  • Another way is to use SFTPCopy plugin (once you pick up from the hub it should appear under Conditions and Actions). So you will be able to pick up the file from your SFTP into a local path and the use Source FILE to continue with the processing of your file. There is a small guide on Reading from SFTP and writing to BigQuery

  • This one is a bit extreme but you can also use a different workflow management platform like airflow for file processing.

Betjens
  • 1,353
  • 2
  • 4
  • 13