0

I want to build a Docker container with airflow. The app requires geospatial packages like Geopandas. When trying to build the Docker Image it fails when trying to install Fiona, it says "

FileNotFoundError: [Errno 2] No such file or directory: 'gdal-config': 'gdal-config'

. I don't know exacly how to prcoeed further. As I don't have conda installed in prod enviornment so I need to install geopanda using pip only.

Below is docker file part:

COPY requirements.txt .
RUN pip install --user -r requirements.txt

Below is requirements.txt

apache-airflow[crypto,celery,postgres,jdbc,mysql,s3,password]==1.10.12
werkzeug<1.0.0
pytz
pyOpenSSL
ndg-httpsclient
gspread
oauth2client
pyasn1
boto3
airtable
numpy
scipy
slackclient
area
google-api-python-client
sqlalchemy
pandas
celery[redis]==4.1.1
analytics-python
networkx
zenpy==2.0.22
pyarrow
google-auth
six==1.13.0
geopandas

I tried to install required package seprately in requirements.txt along with GDAL that is also failing with same error. I want to run a DAG which is using geopandas library running on docker

1 Answers1

0

When installing packages into a docker environment, there is nothing that makes this different from any other local environment, other than maybe the desire to speed up the build. So I'll answer this to highlight a faster option, but any other question which deals with installing geopandas is relevant here.

I'd give the geopandas installation guide a close read. It includes multiple warnings about the issue you're facing. The recommended way to install geopandas is with conda. You cannot install geopandas with pip without manually installing the dependencies, some of which cannot be installed with pip. So you can do this, but simply calling pip install geopandas won't get you there.

I'd recommend using miniforge, or especially since you're building a docker container, mambaforge, it's faster compiled cousin. mamba is a significantly faster drop-in replacement for conda written to build environments in parallel, but tends to crash harder with worse error messages. It's definitely worth the speedup when working with docker containers in my opinion, but if you're struggling to debug something you can always fall back to conda, which comes installed with mamba.

Don't install Anaconda, which includes conda along with a huge number of packages installed from the defaults channel bundled into your base environment, as it will cause a mix and match of channels. Generally, you should keep your base env clean, without any pacakges except those which explicitly manage channels themselves, such as an IDE. Instead, by using miniforge or mambaforge, you'll use the conda-forge channel by default.

To install mambaforge and then create a new geopandas environment:

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh

# install whatever env you'd like here. try to build it in one command
# rather than iteratively installing dependencies
mamba create -n mynewenv -c conda-forge python=3.10 geopandas [other packages]
Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
  • I tried to use curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh" bash Mambaforge-$(uname)-$(uname -m).sh in my environment but after all installations also I am getting "zsh: command not found: mamba" error. Also I don't want to create new virtual env as I want to install geopandas in main environment. – Govind Mishra Dec 13 '22 at 05:07