12

I've created a standard PubSub to BigQuery dataflow. However, in order to ensure I wasn't going to run up a huge bill while offline, I cancelled the dataflow. From the GCP console, there doesn't seem to be an option to restart it - is this possible, either through the console, or through the shell (and if so, how)?

Andrew Mo
  • 1,433
  • 9
  • 12
Paul Michaels
  • 16,185
  • 43
  • 146
  • 269

2 Answers2

15

Cloud Dataflow currently does not provide a mechanism to restart a Dataflow job that has been stopped or cancelled.

However, for this Pub/Sub -> BigQuery flow, one way to approach this would be to use the Google-provided Pub/Sub to BigQuery template; these templates provide code-free solutions for common data movement patterns using Cloud Dataflow.

You can execute a streaming Dataflow job using this template, via the REST API, using a unique job name to ensure that there is only one instance of this Dataflow job running at any point in time. If the job were cancelled, you could (re)start this streaming Dataflow job by running the same command again.

Andrew Mo
  • 1,433
  • 9
  • 12
  • 8
    This is insane that one can't restart a data flow task that fails... over a year later – Greg Hilston Aug 28 '19 at 20:02
  • 4
    WTF? You can't restart a job that stopped. – neildo Feb 25 '20 at 19:51
  • 8
    2020 and it is still not possible :) – dbustosp Mar 10 '20 at 22:19
  • 2
    It's 2021 and still not there. Although, you can "Clone" it via GUI and that will start the job – crtag Jun 24 '21 at 03:10
  • 1
    Except when the `CLONE` button is disabled :( – Matt Byrne Dec 22 '21 at 01:36
  • 1
    It's 2022 and still not possible. Also, when the job is created, if you add a link to a custom UDF say in a bucket, you can't just replace the function js file in the bucket and expect a running job to pick up the changes. You have to stop the job and clone it to a new one (even if it's with the exact same path of the UDF function as the original job). Eeeesh! – Bish May 19 '22 at 17:09
  • The year is 2023 A global pandemic has hit earth, the world was in lockdown. Vaccines were made and the human race successfully fought off the virus. Meanwhile at Google, they still haven't added the functionality to edit and rerun a Dataflow job. – David Sigley Mar 01 '23 at 13:39
3

You can restart the job immediately by cloning. You should see an option clone at the top.