1

I am trying to design CDC pipeline to stream data from cloud SQL to BigQuery using DataStreams and Dataflow on GCP, the datastream part is working fine and I can see data being transferred to CloudStorage successfully in avro format.

When it comes to DataFlow, I am using DataFlow Template DataStream to BigQuery with the configuration in the screenshot

I can see the DataFlow job started and running with no errors in the log, yet I can't see any data transfer happening from Cloud Storage to BigQuery.

It looks to me there is something missing, which is the link between Cloud storage and Pub/Sub, I think it there should be a link to stream the data from GCS to Pub/Sub, and eventually DataFlow stream from Pub/Sub to BQ, no?

What I am missing here?

enter image description here

Karim Tawfik
  • 1,286
  • 1
  • 11
  • 21
  • As given in this [documentation](https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#datastream-to-bigquery) the required parameters provided for Data Flow Template `DataStream to Bigquery` seems to be fine but for streaming data from GCS to Pub/Sub, you can choose the dataflow template [Text Files on Cloud Storage to BigQuery](https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#text-files-on-cloud-storage-to-bigquery-stream). Let me know if that helps. – Sourav Dutta Aug 19 '22 at 06:32
  • The issue is the files on GCS are in avro format, also this will require creating the schema for each table from the database, which is not correct – Karim Tawfik Aug 22 '22 at 08:21
  • So, now your problem is solved ? @karim Tawfik – Sourav Dutta Aug 22 '22 at 08:29
  • @SouravDutta, yes it is now, I was missing the part to link the GCS to pubsub using the below command `gsutil notification create -f "none" -p "mydb/" -t "datastream" "gs://"` – Karim Tawfik Aug 25 '22 at 12:16
  • @KarimTawfik can you please share the documentation that you've used? I'm having the same issue as you and the command ```gsutil notification create ... ``` doesn't seem to help. I already have a topic and a subscription to it. What exactly should happen after this command? – Nina Sep 12 '22 at 12:23
  • @Nina HYG https://cloud.google.com/datastream/docs/implementing-datastream-dataflow-analytics#enable-pub-sub-notifs – Karim Tawfik Sep 15 '22 at 17:56
  • @Nina your problem could be that you are setting the incorrect project for your gcloud command line, try this `gcloud config list` to see which project you are linked to before running the `gsutil` command, to update the project use `gcloud config set project ` – Karim Tawfik Sep 15 '22 at 17:59

1 Answers1

1

It was something missing from my side which is setting up the link between GCS and Pub/Sub using the blow command

gsutil notification create -f "none" -p "db/" -t "datastream" "gs://my-buk"

Karim Tawfik
  • 1,286
  • 1
  • 11
  • 21