I have a very simple streaming pipeline that reads from Pub/Sub, runs inference on a Tensorflow model, and then writes the result back to Pub/Sub:
with beam.Pipeline(options=pipeline_options) as pipeline:
pipeline = (
pipeline
| 'PSRead' >>
beam.io.ReadFromPubSub(
subscription=read_subscription_name,
with_attributes=True,
id_label='message_id'
)
| 'RunModel' >> RunInference(ModelHandler())
| 'PSWrite' >> beam.io.WriteToPubSub(write_topic_name, with_attributes=True)
)
Ideally, this pipeline would leverage Dataflow's Autoscale feature to just be a simple scalable work queue: when there is a backlog of items to inference, add more workers and fire up a copy of this entire pipeline against the same subscription on each one, and compete for work items until the queue is empty, then scale back down. I cannot seem to get this to autoscale up at all from 1 worker, however, and I'm wondering how I should expect this to work with beam.io.ReadFromPubSub
as my source. The documentation for both dataflow and beam is pretty unclear on this, but I think I should be assigning keys somehow to the messages that come out of beam.io.ReadFromPubSub
(because keys determine parallelism? I really don't understand this...). If that's the case, how do I do that? Is each message already its own key? Will I be able to get the actual beam.io.ReadFromPubSub
connector to scale to each worker, or should I expect one instance of the connector and for the RunModel
step only to scale to multiple workers?