I want to use Dataproc Spark to run 2 SQL files on BigQuery that are executed each minute, and then write the results into pub/sub. I am not sure if I can use these two technologies together. Does anyone that already have used Dataproc with Pub/Sub on GCP confirm to me if it's possible..
Asked
Active
Viewed 32 times
1 Answers
1
After your cluster creation, you can use BigQuery connector with Spark to execute the SQL queries on BigQuery. The spark-bigquery-connector with Apache Spark can read and write data from and to BigQuery.
Create a Pub/Sub Lite topic
to write results into Pub/Sub. "Write Pub/Sub Lite messages by using Apache Spark" can be used. This can read and write messages to Pub/Sub Lite using PySpark
from a Dataproc Spark cluster
.
Be sure to have the necessary dependencies to interact with BigQuery and Pub/Sub. Remember that you need to ensure proper authentication and authorization to access BigQuery and Pub/Sub resources from your Spark job.

Poala Astrid
- 1,028
- 2
- 10