We're working to convert a PHP docker image from Ubuntu to Alpine to reduce the image size, remove unnecessary dependencies and decrease built time. Due to the version of PHP we need to support, we can only use Alpine 3.10 for the moment.
One of the tools in the application uses is wkhtmltopdf
to convert HTML files to PDFs. This works great for common English characters but seems to struggle with other characters such as Chinese or Thai.
To reproduce using the below Dockerfile
and test.html
:
------- Dockerfile -------
FROM alpine:3.10
RUN apk update && apk --no-cache add \
git libcurl wget \
curl tzdata procps vim \
python3 py3-pip \
zip unzip \
libsasl \
openssl \
libpng \
libjpeg \
libjpeg-turbo \
freetype \
libxml2 \
fontconfig \
icu libzip \
wkhtmltopdf \
libgcc libstdc++ libx11 glib libxrender libxext libintl \
font-noto-arabic terminus-font ttf-inconsolata ttf-dejavu font-noto font-noto-extra \
ttf-dejavu ttf-droid ttf-freefont ttf-liberation ttf-ubuntu-font-family \
libpng-dev libjpeg-turbo-dev freetype-dev libxml2-dev icu-dev autoconf gcc g++ make libzip-dev \
&& rm -rf /var/cache/apt/* && rm /var/cache/apk/*
COPY ./test.html ./
------- test.html -------
<html>
<body>
<p>English</p>
<p>電子郵件</p>
</body>
</html>
$ docker build -t character_test .
$ docker run --name character_test character_test wkhtmltopdf ./test.html ./test.pdf
$ docker cp character_test:./test.pdf ./test.pdf
$ docker rm character_test
$ docker rmi character_test
Now if you open the PDF, you can see something like the below which does not match the characters in the html file.
As you can see from the Dockerfile, I'm fairly sure we've installed just about every known font for Alpine in an attempt to resolve this but we're not really sure of the problem or how to resolve.
What is causing these characters to display incorrectly and how can we resolve it in our image?