I am a beginner in Dataflow. There is a concept I'm not sure I understand and this is the "state".
When talking about the pipeline state, does it mean the data in the pipeline ? For example, when taking a DataFlow snapshot, the documentation says there are two options:
- Take a snapshot only for the pipeline state in DataFlow.
- Take a snapshot as described in 1, plus a snapshot of the pub/sub source.
Does the state in section 1 mean the pipeline itself (the DAG) and the data in flight ? What does the "state" mean ? And if the data in flight is saved then why do we also need to take a snapshot of the source ?
Thank you
Guy