0

I'd like to use some configs for a library that's used both on Dataflow and in a normal environment.

Is there a way for the code to check it's running on Dataflow? I couldn't see an environment variable, for example.

Quasi-follow-up to Google Dataflow non-python dependencies - separate setup.py?

GreenMatt
  • 18,244
  • 7
  • 53
  • 79
Maximilian
  • 7,512
  • 3
  • 50
  • 63

2 Answers2

0

One option is to use PipelineOptions, which contains the pipeline runner information. As mentioned in the beam documentation: "When you run the pipeline on a runner of your choice, a copy of the PipelineOptions will be available to your code. For example, you can read PipelineOptions from a DoFn’s Context."

More about PipelineOptions: https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options

Andy Xu
  • 101
  • 3
  • How would you access the options in DataFlow? – Maximilian Nov 08 '17 at 19:04
  • You can use getPipelineOptions in a DoFn: https://beam.apache.org/documentation/sdks/javadoc/0.5.0/org/apache/beam/sdk/transforms/DoFn.Context.html#getPipelineOptions-- – Andy Xu Nov 08 '17 at 19:16
  • Thanks. Any idea of how to do that in Python? Looks like it's not available here: https://cloud.google.com/dataflow/pipelines/specifying-exec-params . But often stuff is in the code even when it's not in the DataFlow docs – Maximilian Nov 08 '17 at 19:45
  • Unfortunately, the Python SDK currently does not support getting PipelineOptions in a DoFn :( – Andy Xu Nov 09 '17 at 18:05
0

This is not a good answer, but it may be the best we can do at the moment:

if 'harness' in os.environ.get('HOSTNAME', ''):
Maximilian
  • 7,512
  • 3
  • 50
  • 63