0

I am trying to pass BigTable tableId, instanceId and projectId which are defined as ValueProvider in the TemplateOption class at the execution time as they are runtime values but they don't get honored with the new values . The pipleine gets executed with the old values which were defined when the pipeline was constructed. What changes should i make so that it honors values at runtime?

Pipeline p = Pipeline.create(options);
com.google.cloud.bigtable.config.BigtableOptions.Builder optionsBuilder =
        new com.google.cloud.bigtable.config.BigtableOptions.Builder().
                setProjectId("my-project");   

PCollection<com.google.bigtable.v2.Row> row = p.apply("filtered read", org.apache.beam.sdk.io.gcp.bigtable.BigtableIO.read().withBigtableOptions(optionsBuilder).withoutValidation().withInstanceId(options.getInstanceId()).withProjectId(options.getProjectId()).withTableId(options.getTableId()));
PCollection<KV<Integer,String>> convertToKV = row.apply(ParDo.of(new ConvertToKV()));  

My Option class looks like :--

@Default.String("my-project")
@Description("The Google Cloud project ID for the Cloud Bigtable instance.")
ValueProvider<String> getProjectId();
void setProjectId(ValueProvider<String> projectId);

@Default.String("my-instance")
@Description("The Google Cloud Bigtable instance ID .")
ValueProvider<String> getInstanceId();
void setInstanceId(ValueProvider<String> instanceId);

@Default.String("my-test")
@Description("The Cloud Bigtable table ID in the instance." )
ValueProvider<String> getTableId();
void setTableId(ValueProvider<String> tableId);

@Description("bucket name")
@Default.String("mybucket")
ValueProvider<String> getBucketName();
void setBucketName(ValueProvider<String> bucketName);

Any help would be really appreciated.

Andrew Nguonly
  • 2,258
  • 1
  • 17
  • 23
  • Don't specify values at construction time. What is specified at construction time, stays at what was specified; what isn't specified, will take values at runtime. – jkff Apr 02 '18 at 16:56
  • if i don't specify the value of tableId,instanceId and projectId which are of ValueProvider types throws an error at the construction time...This is the error i am getting....Caused by: java.lang.IllegalArgumentException: tableId was not supplied at com.google.common.base.Preconditions.checkArgument(Preconditions.java:122) at org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$BigtableSource.validate(BigtableIO.java:995) at org.apache.beam.sdk.io.Read$Bounded.expand(Read.java:98) – Rohit Nigam Apr 02 '18 at 17:11
  • This is what i am using to construct...mvn -X compile exec:java -Dexec.mainClass=com.grid.GridProcessingPipeline -Dexec.args="--runner=DataflowRunner --project=my-project --stagingLocation=gs://my-bucket/staging --tempLocation=gs://my-bucket/temp/ --templateLocation=gs://my-bucket/templates/MyTemplate" – Rohit Nigam Apr 02 '18 at 17:15
  • Ah. This is a bug in BigtableIO. Would you mind filing a JIRA at https://issues.apache.org/jira/browse/BEAM or emailing the user@beam.apache.org mailing list? – jkff Apr 03 '18 at 23:34

3 Answers3

3

I do believe that validating runtime parameters at construction time is an issue. However, what I don't understand is not honoring the runtime parameters that were passed when executing the pipeline using the template.

How do you pass your runtime parameters? It should be something like this:

  public interface WordCountOptions extends PipelineOptions {
    @Description("Path of the file to read from")
    @Default.String("gs://dataflow-samples/shakespeare/kinglear.txt")
    ValueProvider<String> getInputFile();
    void setInputFile(ValueProvider<String> value);
  }

    public static void main(String[] args) {
        WordCountOptions options =
              PipelineOptionsFactory.fromArgs(args).withValidation()
                .as(WordCountOptions.class);
        Pipeline p = Pipeline.create(options);

See "create template" for details: https://cloud.google.com/dataflow/docs/templates/creating-templates

Once the template is constructed, you can execute the pipeline with runtime parameters. For example:

gcloud beta dataflow jobs run test-run1 \
        --gcs-location gs://my_template/templates/DemoTemplate \
        --parameters inputFile=/path/to/my-file

See "Execute templates" for details: https://cloud.google.com/dataflow/docs/templates/executing-templates

Note: If you don't pass runtime parameters when executing your pipeline, the parameters will either have default values or null.

Hope this helps!

kuza
  • 2,761
  • 3
  • 22
  • 56
Kevin Si
  • 108
  • 1
  • 7
  • You can check out the google provided templates (https://cloud.google.com/dataflow/docs/guides/templates/provided-batch) and their source code (https://github.com/GoogleCloudPlatform/DataflowTemplates). – Tracy Cui Sep 20 '19 at 19:42
0

I believe that the --inputFiles are bundled in with template when the template is created.

Please see note 1: "In addition to the template file, templated pipeline execution also relies on files that were staged and referenced at the time of template creation. If the staged files are moved or removed, your pipeline execution will fail."

This thread seems relevant as well 2

brugz
  • 52
  • 2
0

Update:

With Flex Templates we can easily pass the values at runtime

https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#top_of_page

Old:

We were also facing the same exception, to fix the issue we added dummy default values for ValueProvider configs and not passed the value at compile time and passed only at run time, it worked fine.

SANN3
  • 9,459
  • 6
  • 61
  • 97