I'm starting this thread with an answer, not a question. The questions are stated at the end:
I tried to add pip package 'tfx' to Apache Airflow using my own Dockerfile and docker-compose.yaml. I added my own DAG to Airflow and that failed to load with this error message:
doc_controls has not attribute 'inheritable_header'
It cost me only a day to find the cause. When you add this to your Dockerfile..
pip install tfx
..pip will install txf, tensorflow-2.6.0 and tensorflow-estimator-2.7.0. The latter is apparently depending on the not-yet-released code in the github repo tensorflow/docs which contains doc_controls.
So instead add this to keep tensorflow-estimator in line with packages that pip can find:
RUN pip install --no-cache-dir --user \
tfx==1.3.1 \
tensorflow==2.6.0 \
tensorflow-estimator==2.6.0
I'm loosing a lot of time solving problems with dependencies between pip packages and pip packages and the underlying C/C++ libraries. Am I the only one?
Here are my questions:
Am I correct to assume that pip is supposed to figure out which versions of dependencies of tfx to install. Should I normally be able to rely on pip to do this correctly or will pip simply install the latest version of all dependencies without regard to their mutual compatibility?
On the internet there are many Dockerfile around that do not specify any version numbers of the apt/pip packages to install. Such a Dockerfile is like a box of chocolates right? If you build the dockerfile a time t1 and at time t2 then their contents can differ in terms of versions right?
In general: Given a docker image why can one not get the Dockerfile that was used to construct the docker image?
Regards, Chris