0

I want to use Dataproc Spark to run 2 SQL files on BigQuery that are executed each minute, and then write the results into pub/sub. I am not sure if I can use these two technologies together. Does anyone that already have used Dataproc with Pub/Sub on GCP confirm to me if it's possible..

Lara
  • 21
  • 4

1 Answers1

1

After your cluster creation, you can use BigQuery connector with Spark to execute the SQL queries on BigQuery. The spark-bigquery-connector with Apache Spark can read and write data from and to BigQuery.

Create a Pub/Sub Lite topic to write results into Pub/Sub. "Write Pub/Sub Lite messages by using Apache Spark" can be used. This can read and write messages to Pub/Sub Lite using PySpark from a Dataproc Spark cluster.

Be sure to have the necessary dependencies to interact with BigQuery and Pub/Sub. Remember that you need to ensure proper authentication and authorization to access BigQuery and Pub/Sub resources from your Spark job.

Poala Astrid
  • 1,028
  • 2
  • 10