I am trying to install PyMuPDF in the official Python 3.8 alpine docker image. The dockerfile is like this:
FROM python:3.8-alpine
RUN apk add --update --no-cache \
gcc g++ \
libc-dev \
python3-dev \
build-base \
cairo-dev \
cairo \
cairo-tools \
jpeg-dev \
zlib-dev \
freetype-dev \
lcms2-dev \
openjpeg-dev \
tiff-dev \
tk-dev \
tcl-dev \
mupdf-dev \
musl-dev \
jbig2dec \
openjpeg-dev \
harfbuzz-dev \
vim bash
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir --upgrade pip && \
pip install --cache-dir .pip-cache -r requirements.txt && \
rm -rf .pip-cache
The version of PyMuPDF I'm trying to install is 1.20.1
Attempts to build this image is failing with this error:
#10 137.0 × Encountered error while trying to install package.
#10 137.0 ╰─> PyMuPDF
As I understand, a PyMuPDF wheel for Alpine linux is not available. That's why we have to make it from source. Scrolling up a bit in the the terminal, I see this:
#10 124.9 scripts/tesseract/endianness.h:20:2: error: #error "I don't know what architecture this is!"
#10 124.9 20 | #error "I don't know what architecture this is!"
#10 124.9 | ^~~~~
#10 124.9 make: *** [Makefile:133: build/release/source/fitz/tessocr.o] Error 1
So looks like building PyMuPDF fails because tesseract cannot recognize the endianness of this environment. How can I move past this hurdle?
If you have a working example of installing PyMuPDF in this docker image, please let me know. Thanks in advance.