I have an Azure Devops pipeline which builds docker images which will be runned on different IoT-Edge devices. The devices have very bad internet connection, therefor small docker diff size is crucial.
The codebase consist of a typescript nodejs server (hence yarn build
) which need node_modules that require building with make gcc etc.
When I run the docker build process locally on my machine, docker uses intermediate layers from previous builds so the diff when using docker history <image_id>
is just 700kb. (the actual codebase). I see in the build logs that yarn install
is taken from cache, so the layer hash becomes identical.
When building the image in azure, the diff becomes 90MB. (The entire node_modules is copied)
I have extracted the node_modules from each image and compared hashes for all files in each folder with HashMyFiles.exe
comparing SHA-1 and SHA-256.
But the uncompressed tar hash is not identical referring to this post about how docker layers are hashed: How Docker calculates the hash of each layer? Is it deterministic?
So the question is How can I avoid having to pull the whole node_modules for each code change, when building image in azure.
One solution we have discussed is to build a docker node-image preinstall with our desired node_modules. But this is not preferred and needs extra work when changing modules.
Docker history from two identical codebases built in azure: 1
PS C:\temp\cby> docker history a8f3453f4c1c
IMAGE CREATED CREATED BY SIZE COMMENT
a8f3453f4c1c 2 hours ago /bin/sh -c #(nop) CMD ["node" "./dist/index… 0B
<missing> 2 hours ago /bin/sh -c #(nop) ENV NODE_ENV=production 0B
<missing> 2 hours ago /bin/sh -c mkdir ./logs/ 0B
<missing> 2 hours ago /bin/sh -c yarn run build 3.34MB
<missing> 2 hours ago /bin/sh -c #(nop) COPY dir:31a5b4423ce7e6928… 323kB
<missing> 2 hours ago /bin/sh -c #(nop) COPY dir:a234dce19106582d9… 93.7MB
<missing> 2 hours ago /bin/sh -c #(nop) WORKDIR /app 0B
<missing> 2 hours ago /bin/sh -c apk add --no-cache udev 1.83MB
<missing> 2 days ago 70.2MB merge sha256:eef5dfda7c2565cba57f222376d551426487839af67cf659bb3bb4fa51ef688a to sha256:6d1ef012b5674ad8a127ecfa9b5e6f5178d171b90ee462846974177fd9bdd39f
<missing> 2 days ago /bin/sh -c rm -rf latest.tar.gz* /tmp/* … 0B
<missing> 2 days ago /bin/sh -c apk del curl gnupg 0B
<missing> 2 days ago /bin/sh -c curl -sfSL -O https://yarnpkg.com… 0B
<missing> 2 days ago /bin/sh -c for server in ipv4.pool.sks-keyse… 0B
<missing> 2 days ago /bin/sh -c /usr/lib/node_modules/npm/bin/npm… 0B
<missing> 2 days ago /bin/sh -c apk upgrade --no-cache -U && ap… 0B
<missing> 2 days ago /bin/sh -c #(nop) COPY file:fc6fb2d3d0d591f8… 0B
<missing> 2 days ago /bin/sh -c #(nop) COPY dir:3d23406cd5b322399… 0B
<missing> 2 days ago /bin/sh -c #(nop) COPY dir:857b32a43b41ef438… 0B
<missing> 3 days ago /bin/sh -c #(nop) COPY file:20cc2cc5b0ae7508… 0B
<missing> 10 months ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 10 months ago /bin/sh -c #(nop) ADD file:aa17928040e31624c… 4.21MB
2
PS C:\temp\cby> docker history 2fc80525d55e
IMAGE CREATED CREATED BY SIZE COMMENT
2fc80525d55e 45 seconds ago /bin/sh -c #(nop) CMD ["node" "./dist/index… 0B
<missing> 46 seconds ago /bin/sh -c #(nop) ENV NODE_ENV=production 0B
<missing> 46 seconds ago /bin/sh -c mkdir ./logs/ 0B
<missing> 48 seconds ago /bin/sh -c yarn run build 3.34MB
<missing> 57 seconds ago /bin/sh -c #(nop) COPY dir:31a5b4423ce7e6928… 323kB
<missing> About a minute ago /bin/sh -c #(nop) COPY dir:a234dce19106582d9… 93.7MB
<missing> About a minute ago /bin/sh -c #(nop) WORKDIR /app 0B
<missing> About a minute ago /bin/sh -c apk add --no-cache udev 1.83MB
<missing> 2 days ago 70.2MB merge sha256:eef5dfda7c2565cba57f222376d551426487839af67cf659bb3bb4fa51ef688a to sha256:6d1ef012b5674ad8a127ecfa9b5e6f5178d171b90ee462846974177fd9bdd39f
<missing> 2 days ago /bin/sh -c rm -rf latest.tar.gz* /tmp/* … 0B
<missing> 2 days ago /bin/sh -c apk del curl gnupg 0B
<missing> 2 days ago /bin/sh -c curl -sfSL -O https://yarnpkg.com… 0B
<missing> 2 days ago /bin/sh -c for server in ipv4.pool.sks-keyse… 0B
<missing> 2 days ago /bin/sh -c /usr/lib/node_modules/npm/bin/npm… 0B
<missing> 2 days ago /bin/sh -c apk upgrade --no-cache -U && ap… 0B
<missing> 2 days ago /bin/sh -c #(nop) COPY file:fc6fb2d3d0d591f8… 0B
<missing> 2 days ago /bin/sh -c #(nop) COPY dir:3d23406cd5b322399… 0B
<missing> 2 days ago /bin/sh -c #(nop) COPY dir:857b32a43b41ef438… 0B
<missing> 3 days ago /bin/sh -c #(nop) COPY file:20cc2cc5b0ae7508… 0B
<missing> 10 months ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 10 months ago /bin/sh -c #(nop) ADD file:aa17928040e31624c… 4.21MB
My Dockerfile, I have tried multiple dockerfiles, with and without multistage build, with same results. Azure gives a large diff when downloading image:
FROM mhart/alpine-node:10
RUN apk add --no-cache make gcc g++ python linux-headers udev
WORKDIR /app
# Install node modules first (avoids reinstalling for every source code change).
COPY package.json yarn.lock ./
RUN yarn install
FROM mhart/alpine-node:10
RUN apk add --no-cache udev
WORKDIR /app
COPY --from=0 /app/node_modules ./node_modules
COPY . .
RUN yarn run build
RUN mkdir ./logs/
ENV NODE_ENV production
CMD ["node", "./dist/index.js"]
.dockerignore
node_modules
/build
.DS_Store
.env.local
.env.development.local
.env.test.local
.env.production.local
npm-debug.log*
yarn-debug.log*
yarn-error.log*
/logs
/tests
/testlogs
/dist
/ota
.vscode
.git
EDIT + Temporary solution
We ended up doing a self hosted build agent, it's a bit more expensive, but we get much faster build time and correct cache for each operation. And most importantly we got much smaller download sizes.
I'm not sure why the docker build gives a new hash each time we run a build on each operation.
Build time would still be slow if the hash were the same because azure build agents starts with a clean machine each time.