0

We need a non-Python dependency installed into our Dataflow process (we need an ODBC driver to access an MSSQL DB)

We've written a setup.py that successfully installs those using the steps here: https://cloud.google.com/dataflow/pipelines/dependencies-python#non-python-dependencies

We want to keep our original setup.py for the package (which doesn't install those extra dependencies); is there a way of using a different setup.py for Dataflow installs?

We tried:

  • calling it setup_dataflow.py, but Dataflow raised an error stating it needed to be called setup.py.
  • following the steps here, and using a setup.py within a child path to the root path. We weren't successful at that

We could try a if statement within setup.py to identify whether it's being installed in a Dataflow environment (though I couldn't find any reliable environment variables to identify this)

Any advice / suggestions?

Thanks

Maximilian
  • 7,512
  • 3
  • 50
  • 63

1 Answers1

0

Currently there's no convenient way to do this. You could have two different packages, something like so:

+- dataflow_pipeline
++- setup.py
+- original_pipeline
++- setup.py
++- pipeline.py

Where dataflow_pipeline/setup.py simply imports original_package, and adds the extra dependencies.

It's not ideal, but it should work.

Pablo
  • 10,425
  • 1
  • 44
  • 67