1

I am new to GCP and while reading the documentation about Auto-tuning by Dataflow service they are talking about backlog and auto-scaling that depends on it. In this particular case what is backlog? If my pipeline is reading from a pub/sub, is it the age of oldest message or the number of unacknowledged messages?

lookout
  • 33
  • 5

1 Answers1

2

Backlogs in Dataflow aren't related to PubSub. Dataflow always get a message from PubSub when it is here. But the processing queue can increase internally in Dataflow: that is the backlogs. If it's too big, and the CPU consumption too high a new worker is added to the pipeline.

In streaming mode, you still have backlog, but you also have a predictive backlog. In fact, it compare the number of message in each time windows and if the number of message increase that can be the beginning of a spike and dataflow can scale up proactively.

guillaume blaquiere
  • 66,369
  • 2
  • 47
  • 76
  • @guillaume_blaquiere Thanks for the explanation. I understood what you have said except the second sentence. What do you mean by "Dataflow always get a message from PubSub when it is here"? – lookout May 28 '21 at 14:32
  • Excuse my english ;) Dataflow create a pull connexion to PubSub and get the messages immediately. You haven't backlog in the PubSub subscription, the subscription is normally empty – guillaume blaquiere May 28 '21 at 15:13