0

I found this article about running a dataflow batch on preemptive machines.

I tried to use this feature using this script:

gcloud beta dataflow jobs run $JOB_NAME \
    --gcs-location gs://.../Datastore_to_Datastore_Delete \
    --flexRSGoal=COST_OPTIMIZED \
    --region ...1 \
    --staging-location gs://.../temp \
    --network XXX \
    --subnetwork regions/...1/subnetworks/... \
    --max-workers 1 \
    --parameters \
datastoreReadGqlQuery="$QUERY",\
datastoreReadProjectId=$PROJECTID,\
datastoreDeleteProjectId=$PROJECTID

But this is the result:

ERROR: (gcloud.beta.dataflow.jobs.run) unrecognized arguments: --flexRSGoal=COST_OPTIMIZED

To search the help text of gcloud commands, run: gcloud help -- SEARCH_TERMS

I run the command gcloud beta dataflow jobs run help and seems like this option flexRSGoal is not there...

# gcloud version
Google Cloud SDK 319.0.0
alpha 2020.11.13
beta 2020.11.13
bq 2.0.62
core 2020.11.13
gsutil 4.55
kubectl 1.16.13

What I'm missing?

No1Lives4Ever
  • 6,430
  • 19
  • 77
  • 140

2 Answers2

0

Have you followed this? It seems that the correct command should be:

--flexrs_goal=COST_OPTIMIZED

ziqi
  • 27
  • 2
0

It seems the --flexrs_goal flag [1] is not intended for the gcloud beta dataflow jobs run command tool, but for java/python command tools. For example the python3 -m ... command as the ones in [2] (Complete lecture of this doc recommended).

So instead of using:

gcloud beta dataflow jobs run <job_name> 
    --flexRSGoal=COST_OPTIMIZE ...

Run:

python3 <my-pipeline-script.py> \
  --flexrs_goal=COST_OPTIMIZED ...

If you prefer to use java just switch the --flexRSGoal flag to --flexRSGoal and follow [3] instead [2].

[1] https://cloud.google.com/dataflow/docs/guides/flexrs#python

[2] https://cloud.google.com/dataflow/docs/quickstarts/quickstart-python#run-wordcount-on-the-dataflow-service

[3] https://cloud.google.com/dataflow/docs/quickstarts/quickstart-java-maven

Sarrión
  • 77
  • 4