0

I have an Azure Devops pipeline which builds docker images which will be runned on different IoT-Edge devices. The devices have very bad internet connection, therefor small docker diff size is crucial.

The codebase consist of a typescript nodejs server (hence yarn build) which need node_modules that require building with make gcc etc.

When I run the docker build process locally on my machine, docker uses intermediate layers from previous builds so the diff when using docker history <image_id> is just 700kb. (the actual codebase). I see in the build logs that yarn install is taken from cache, so the layer hash becomes identical.

When building the image in azure, the diff becomes 90MB. (The entire node_modules is copied) I have extracted the node_modules from each image and compared hashes for all files in each folder with HashMyFiles.exe comparing SHA-1 and SHA-256.

But the uncompressed tar hash is not identical referring to this post about how docker layers are hashed: How Docker calculates the hash of each layer? Is it deterministic?


So the question is How can I avoid having to pull the whole node_modules for each code change, when building image in azure.


One solution we have discussed is to build a docker node-image preinstall with our desired node_modules. But this is not preferred and needs extra work when changing modules.

Docker history from two identical codebases built in azure: 1

PS C:\temp\cby> docker history a8f3453f4c1c
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
a8f3453f4c1c        2 hours ago         /bin/sh -c #(nop)  CMD ["node" "./dist/index…   0B
<missing>           2 hours ago         /bin/sh -c #(nop)  ENV NODE_ENV=production      0B
<missing>           2 hours ago         /bin/sh -c mkdir ./logs/                        0B
<missing>           2 hours ago         /bin/sh -c yarn run build                       3.34MB
<missing>           2 hours ago         /bin/sh -c #(nop) COPY dir:31a5b4423ce7e6928…   323kB
<missing>           2 hours ago         /bin/sh -c #(nop) COPY dir:a234dce19106582d9…   93.7MB
<missing>           2 hours ago         /bin/sh -c #(nop) WORKDIR /app                  0B
<missing>           2 hours ago         /bin/sh -c apk add --no-cache udev              1.83MB
<missing>           2 days ago                                                          70.2MB              merge sha256:eef5dfda7c2565cba57f222376d551426487839af67cf659bb3bb4fa51ef688a to sha256:6d1ef012b5674ad8a127ecfa9b5e6f5178d171b90ee462846974177fd9bdd39f
<missing>           2 days ago          /bin/sh -c rm -rf latest.tar.gz* /tmp/*     …   0B
<missing>           2 days ago          /bin/sh -c apk del curl gnupg                   0B
<missing>           2 days ago          /bin/sh -c curl -sfSL -O https://yarnpkg.com…   0B
<missing>           2 days ago          /bin/sh -c for server in ipv4.pool.sks-keyse…   0B
<missing>           2 days ago          /bin/sh -c /usr/lib/node_modules/npm/bin/npm…   0B
<missing>           2 days ago          /bin/sh -c apk upgrade --no-cache -U &&   ap…   0B
<missing>           2 days ago          /bin/sh -c #(nop) COPY file:fc6fb2d3d0d591f8…   0B
<missing>           2 days ago          /bin/sh -c #(nop) COPY dir:3d23406cd5b322399…   0B
<missing>           2 days ago          /bin/sh -c #(nop) COPY dir:857b32a43b41ef438…   0B
<missing>           3 days ago          /bin/sh -c #(nop) COPY file:20cc2cc5b0ae7508…   0B
<missing>           10 months ago       /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B
<missing>           10 months ago       /bin/sh -c #(nop) ADD file:aa17928040e31624c…   4.21MB

2

PS C:\temp\cby> docker history 2fc80525d55e
IMAGE               CREATED              CREATED BY                                      SIZE                COMMENT
2fc80525d55e        45 seconds ago       /bin/sh -c #(nop)  CMD ["node" "./dist/index…   0B
<missing>           46 seconds ago       /bin/sh -c #(nop)  ENV NODE_ENV=production      0B
<missing>           46 seconds ago       /bin/sh -c mkdir ./logs/                        0B
<missing>           48 seconds ago       /bin/sh -c yarn run build                       3.34MB
<missing>           57 seconds ago       /bin/sh -c #(nop) COPY dir:31a5b4423ce7e6928…   323kB
<missing>           About a minute ago   /bin/sh -c #(nop) COPY dir:a234dce19106582d9…   93.7MB
<missing>           About a minute ago   /bin/sh -c #(nop) WORKDIR /app                  0B
<missing>           About a minute ago   /bin/sh -c apk add --no-cache udev              1.83MB
<missing>           2 days ago                                                           70.2MB              merge sha256:eef5dfda7c2565cba57f222376d551426487839af67cf659bb3bb4fa51ef688a to sha256:6d1ef012b5674ad8a127ecfa9b5e6f5178d171b90ee462846974177fd9bdd39f
<missing>           2 days ago           /bin/sh -c rm -rf latest.tar.gz* /tmp/*     …   0B
<missing>           2 days ago           /bin/sh -c apk del curl gnupg                   0B
<missing>           2 days ago           /bin/sh -c curl -sfSL -O https://yarnpkg.com…   0B
<missing>           2 days ago           /bin/sh -c for server in ipv4.pool.sks-keyse…   0B
<missing>           2 days ago           /bin/sh -c /usr/lib/node_modules/npm/bin/npm…   0B
<missing>           2 days ago           /bin/sh -c apk upgrade --no-cache -U &&   ap…   0B
<missing>           2 days ago           /bin/sh -c #(nop) COPY file:fc6fb2d3d0d591f8…   0B
<missing>           2 days ago           /bin/sh -c #(nop) COPY dir:3d23406cd5b322399…   0B
<missing>           2 days ago           /bin/sh -c #(nop) COPY dir:857b32a43b41ef438…   0B
<missing>           3 days ago           /bin/sh -c #(nop) COPY file:20cc2cc5b0ae7508…   0B
<missing>           10 months ago        /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B
<missing>           10 months ago        /bin/sh -c #(nop) ADD file:aa17928040e31624c…   4.21MB

My Dockerfile, I have tried multiple dockerfiles, with and without multistage build, with same results. Azure gives a large diff when downloading image:

FROM mhart/alpine-node:10
RUN apk add --no-cache make gcc g++ python linux-headers udev
WORKDIR /app

# Install node modules first (avoids reinstalling for every source code change).
COPY package.json yarn.lock ./
RUN yarn install


FROM mhart/alpine-node:10
RUN apk add --no-cache udev
WORKDIR /app
COPY --from=0 /app/node_modules ./node_modules
COPY . .
RUN yarn run build

RUN mkdir ./logs/

ENV NODE_ENV production

CMD ["node", "./dist/index.js"]

.dockerignore

node_modules
/build
.DS_Store
.env.local
.env.development.local
.env.test.local
.env.production.local

npm-debug.log*
yarn-debug.log*
yarn-error.log*

/logs
/tests
/testlogs
/dist
/ota

.vscode
.git

EDIT + Temporary solution

We ended up doing a self hosted build agent, it's a bit more expensive, but we get much faster build time and correct cache for each operation. And most importantly we got much smaller download sizes.

I'm not sure why the docker build gives a new hash each time we run a build on each operation.

Build time would still be slow if the hash were the same because azure build agents starts with a clean machine each time.

2 Answers2

0

For what you are doing you don't really need a 2 stages build. One is enough. But there are a few general issues with your approach.

Before running yarn build you only need to copy package*.json, not everything in the context (keep in mind that your local context contains already node_modules but your remote server doesn't).

When you do the following:

COPY --from=0 /app/node_modules ./node_modules
COPY . .

you actually overwrite the folder node_modules with whatever you have in your context so basically your previous steps were useless.

I suggest you try something like this:

FROM mhart/alpine-node:10
RUN apk add --no-cache udev
WORKDIR /app

COPY package*.json ./
RUN yarn install

COPY index.js ./
# also copy here any other project files if any

RUN yarn run build

RUN mkdir ./logs/

ENV NODE_ENV production

CMD ["node", "./dist/index.js"]
Mihai
  • 9,526
  • 2
  • 18
  • 40
  • Your dockerfile works as expected locally, but same result when building in azure. I have included in my question my .dockerignore file which should avoid copying node_modules into the docker context. The 2 stage build is to avoid pushing all the build dependencies, which is needed when running yarn install. So without 2 stage the size is 482MB. With 2 stage and copy modules is 174MB. Thanks for the suggestion, but unfortunately not the solution. – Mathias Haugsbø Jan 17 '20 at 10:14
0

Docker caches the results of individual steps in the Dockerfile. This is fairly all-or-nothing; if the previous step was cached, and you're doing a step that's identical to something you've done before, then docker build will used the cached result, but if it's not identical, nothing at all will come from the cache from then on out.

In particular in your build stage when you

COPY . ./

this invalidates the cache if any file changes; then when you run yarn install on the next line, it will almost always get repeated. At this point you only actually need the package metadata files, so you can instead

COPY package.json yarn.lock ./
RUN yarn install

and that will not get repeated on rebuilds.


If image size is a concern, you can also yarn install --production to not install the devDependencies out of your package.json. In typical use you can do this in your final runtime image, but in your case you need a C toolchain to build those dependencies. That means you'd have three stages in your Dockerfile:

  1. Based on Node, plus a C toolchain, that installs all of the dependencies and runs yarn build
  2. Based on Node, plus a C toolchain, that only runs yarn install --production
  3. Based on Node, that only COPY --from=... the built application from the first stage and the runtime node_modules from the second stage, and then has the usual EXPOSE and CMD metadata
David Maze
  • 130,717
  • 29
  • 175
  • 215
  • Almost all of my dependencies is used during runtime, so not much to save there. I have updated my Dockerfile to just copy package.json before yarn install. But the same thing happens when building in azure. When building locally the diff is just 300kb. Thank you for the suggestions – Mathias Haugsbø Jan 17 '20 at 11:42