0

I have developed a real time pipeline in data fusion to fetch data from pubsub and then feed into GCS and thereafter in BQ. However, after GCS (which is available as a sink), i am not able to feed the data into BQ because GCS is only available as a sink and hence, it doesnt give any output schema. Is there any way out that i can create a pipeline to take the data from GCS to BQ

  • If I understand it right you are trying to create a pipeline that writes from pubsub to GCS and then from GCS to bigquery? In the current state a sink cannot connect to another sink plugin in Data Fusion pipeline. Can you share your pipeline and talk more about what you are trying to do? – Ajai Aug 20 '19 at 18:46
  • If I understand it right you are trying to create a pipeline that writes from pubsub to GCS and then from GCS to bigquery? ---YES...This is exactly what I am trying to do..I was able to create a real time pipeline that pulls data from a pubsub and then puts into a GCS.But now i wnate dto transform the data using Wrnagler and push it into BQ. But since GCS (Sink) has no o/p schema i am not able to do it – user3746835 Aug 21 '19 at 06:21
  • You cannot connect a sink to another sink. You would either have to write it to Big query or to GCS in a single pipeline. Or you could configure wrangler to connect to bigquery sink AND GCS sink and copy them simultaneously. It would be very helpful if you could share your pipeline to understand. – Ajai Aug 21 '19 at 23:41

1 Answers1

0

To provide a possible solution: It is not possible to connect a sink to another sink. From based on the question my guess is SO is trying to connect the GCS sink plugin to BQ sink and have data flow from one sink to another. That is not by design possible with Data Fusion pipelines.

SO can push data simultaneously from pubsub source to both BQ and GCS sinks directly instead of pushing one after the other. It would look something like this,

enter image description here

Hope this helps.

Ajai
  • 3,440
  • 5
  • 28
  • 41