0

I'm expected to get a stream of messages from an external website to a pubsub topic. Data Fusion is used to build a simple realtime pipeline with a pubsub source (for before ) > wrangler > BigQuery sink. My use-case wants to send a dummy message or any other kind of trigger to run a ML model only after data is landed in BigQuery. How do I achieve this? Any suggestion is appreciated.

hansa29
  • 13
  • 5
  • If you stream the data, you will never stop to stream. So, my question: when is the 'after data is landed' time? What's the event/data that tell you "Ok, we can start the ML training now"? – guillaume blaquiere Sep 01 '22 at 19:18
  • Hi @guillaumeblaquiere, the stream is from a website user. so whenever there is an event on the website, pubsub topic receives the event and cdf pipeline stores the data in BQ. when this event lands in BQ, the ML model needs to use this event for training and send the result back to the website. hope this is clear – hansa29 Sep 01 '22 at 21:17
  • Do you mean training or prediction? And why are you waiting to have the event in BigQuery and why did you not consume it directly from PubSub? – guillaume blaquiere Sep 02 '22 at 12:44
  • prediction it is. certain transformations needs to be applied on the event data and then fed to the ML model. also we have a data ocean where we want these events to be stored for future purpose. – hansa29 Sep 02 '22 at 13:44
  • Yes but you have 2 flows: one realtime flow, with your event, that you can enrich with external data, submit it to your model, get the inference result and return it to the website. The second flow is storing the event in you data lake. You have 1 hot flow, and one cold flow. It's named Lamda architecture: https://en.wikipedia.org/wiki/Lambda_architecture – guillaume blaquiere Sep 02 '22 at 14:25
  • Thank you for the suggestion @guillaumeblaquiere. I'll try this approach after a quick discussion with the design architect here. on a different note, I found this google page about BQ triggers, https://cloud.google.com/blog/topics/developers-practitioners/how-trigger-cloud-run-actions-bigquery-events. – hansa29 Sep 02 '22 at 15:28

0 Answers0