1

I am a beginner in Dataflow. There is a concept I'm not sure I understand and this is the "state".

When talking about the pipeline state, does it mean the data in the pipeline ? For example, when taking a DataFlow snapshot, the documentation says there are two options:

  1. Take a snapshot only for the pipeline state in DataFlow.
  2. Take a snapshot as described in 1, plus a snapshot of the pub/sub source.

The documentatin

Does the state in section 1 mean the pipeline itself (the DAG) and the data in flight ? What does the "state" mean ? And if the data in flight is saved then why do we also need to take a snapshot of the source ?

Thank you

Guy

user1021712
  • 333
  • 1
  • 3
  • 10

1 Answers1

2

Yes, it means the running pipeline and data inflight. With the snapshot, you can recreate the state of the running job with a newer versioned pipeline. It's basically updating a streaming job without draining.

The snapshot of the source is specifically for Pub/Sub so that when reading from the existing subscription, it knows the ack state of inflight messages.

ningk
  • 1,298
  • 1
  • 7
  • 7