I'm running into a problem testing out Dataflow by running code like this from a Datalab cell.
import apache_beam as beam
# Pipeline options:
options = beam.options.pipeline_options.PipelineOptions()
gcloud_options = options.view_as(beam.options.pipeline_options.GoogleCloudOptions)
gcloud_options.job_name = 'test002'
gcloud_options.project = 'proj'
gcloud_options.staging_location = 'gs://staging'
gcloud_options.temp_location = 'gs://tmp'
# gcloud_options.region = 'europe-west2'
# Worker options:
worker_options = options.view_as(beam.options.pipeline_options.WorkerOptions)
worker_options.disk_size_gb = 30
worker_options.max_num_workers = 10
# Standard options:
options.view_as(beam.options.pipeline_options.StandardOptions).runner = 'DataflowRunner'
# options.view_as(beam.options.pipeline_options.StandardOptions).runner = 'DirectRunner'
# Pipeline:
PL = beam.Pipeline(options=options)
query = 'SELECT * FROM [bigquery-public-data:samples.shakespeare] LIMIT 10'
(
PL | 'read' >> beam.io.Read(beam.io.BigQuerySource(project='project', use_standard_sql=False, query=query))
| 'write' >> beam.io.WriteToText('gs://test/test2.txt', num_shards=1)
)
PL.run()
print "Complete"
There has been various successful attempts and a few that have failed. This is fine and understood but what I don't understand is what I have done to change the SDK version from 2.9.0 to 2.0.0, as shown below. Could anyone point out what I've done and how to move back up to SDK version 2.9.0 please?