0

I have a task to implement a backend service(Java/Spring) running in Cloud Run and create (configure) a data pipeline in GCP that uses this service for automatic processing of Avro files with embedded schema that are being uploaded to a Cloud Storage bucket. When new file is uploaded to bucket, I need to process it and parse it to BigQuery in specific way.

So, I have successfully deployed Spring application and designed avro schema. I find out that google has it example on how to load avros to BigQuery example, I think this can be applied for this task.

I stuck on the uploading event(or data pipeline configuration maybe?). I really don't know how to handle file uploading events(I suppose I need to get URI of new file when it is uploaded). I tried to read about Google Dataflow, but I don't think this is what I need for my task. Could you please give me some advice on how I should do this.

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
MoldyBread
  • 3
  • 1
  • 3

1 Answers1

0

One of the options is to use Cloud Functions configured to be triggered by uploading a file. Another option is to subscribe to Pub/Sub notifications for Cloud Storage. One more option is to use Apache Beam's Google Cloud Dataflow Runner.

Under heavy load sometimes triggering events can be duplicated, so you need to make sure that single file won't be processed multiple times by your cloud function.

Sergey Geron
  • 9,098
  • 2
  • 22
  • 29
  • Thank you @Sergey, I litteraly just read about cloud functions, but I really appreciate that you mentioned another two ways – MoldyBread Nov 01 '20 at 19:56