Should each Docker image contain a JDK?

Question

So, I'm very new to Docker. Let me explain the context to the question.

I have 10 - 20 Spring Boot micro-service applications, each running on different ports on my local machine.
But for migrating to Docker, based on my learning, each of the services must be in a different Docker container so as to quickly deploy or make copies.
For each Docker container, we need to create a new Docker image.
Each Docker image must contain a JRE for the Spring Boot application to run. It is around 200 MB maximum. That means each docker image is, say 350 MB at the maximum. On the other hand, on my local PC I have only one JRE of 200 MB and each application takes only a few MB of space.
Based on this, I would need 600 MB on my local system, yet need 7 GB for all Docker images.

Is this approach correct? Should "OpenJDK" from DockerHub be added to each image?

Why is the size of the image large even if the target PC may already have the JDK?

You seem to talk about JDK and JRE - Ideally you would avoid building the images with JDK, as you only need it at build time, and just have the JRE in the production image. Note you have have mutliple `FROM`s in Dockerfile so you can build with JDK and then package with only JRE. — mcfedr, Oct 17 '18 at 12:57
Indeed. Take a look at [multistage builds](https://www.google.com/search?q=docker+multistage+build). This allows you to build with the JDK in one image, then copy the built artefacts into a lighter run-time image. — spender, Oct 17 '18 at 15:14

score 35 · Accepted Answer · edited Oct 17 '18 at 14:31

35

Your understanding is not correct.

Docker images are formed with layers; see next diagram:

When you install a JRE in your image, let's suppose its checksum is 91e54dfb1179 in the next picture, it will occupy your disk really.

But, if all your containers are then all based on the same image, and add different things, says, your different microservice application to the thin R/W layer, all containers will share the 91e54dfb1179, so it will not be the n*m relationship.

You need to pay attention to using the same base image for all Java applications as much as possible, and add different things to the thin R/W layer.

edited Oct 17 '18 at 14:31

Peter Mortensen

30,738
21
105
131

answered Oct 17 '18 at 07:29

atline

28,355
16
77
113

Good answer, but I have one more doubt. Suppose the docker images are built in different systems? Say each micro service is built by a separate team in a different geographic location? This sharing of existing jre with id won't hold then Right? – SamwellTarly Oct 17 '18 at 13:10
@SamwellTarly Use a good commonen base image, when appropiate - this base image should contain the heavy common parts. – Christian Sauer Oct 17 '18 at 13:28
1

@SamwellTarly You need to align a base image with most of common things together at least jre which you care most to one custom base image. And, suggest use dockerhub or private docker registery to share it. Then every service team could add things base on this base image. – atline Oct 17 '18 at 13:37
You should consider using [OpenJDK](https://hub.docker.com/_/openjdk/) as your base image. – JimmyJames Oct 17 '18 at 13:45
I doubt using common image across multiple containers will occupy single space. Each container actually installs all the libs required to run the image. So each container will have a copy of all the libs (thus separate copy of OpenJDK with each container). – Jignesh M. Khatri Aug 23 '22 at 16:40

score 5 · Answer 2 · answered Oct 17 '18 at 17:02

5

The other answers cover Docker layering pretty well, so I just want to add details for you questions

Is this approach correct? Should "OpenJDK" from DockerHub be added to each image?

Yes. If it's not in the image, it won't be in the container. You can save disk space though by reusing as many Layers as possible. So try to write your Dockerfile from "Least likely to change" to "Most likely to change". So when you build your image, the more often you see "Using cache", the better.

Why is the size of the image large even if the target PC may already have the JDK?

Docker wants as little to do with the host as possible. Docker doesn't even want to deal with the host. The first thing it does is create a VM to hide in. Docker images assume the only thing the host will give is empty ram, disk, and CPUs. So each Docker image must also contain it's own OS/kernel. (That is what your initial FROM is doing, picking a base OS image to use) So your final image size is actually OS + tools + app. Image size is a little misleading though, as it is the sum of all layers, which are reused across images.

(Implied) Should each app/micro-service be in its own container?

Ideally, yes. By converting your app into an isolated module, it makes it easier to replace/load-balance that module.

In practice, maybe not (for you). Spring Boot is not a light framework. In fact, it is a framework for module-izing your code (Effectively running a module control system inside a module control system). And now you want to host 10-20 of them? That is probably not going to be able to run on a single server. Docker will force Spring boot to load itself into memory per app; and objects can't be reused across modules now, so those need to be multi-instantiated too! And if you are restricted to 1 production server, horizontal scaling isn't an option. (You will need ~1GB of HEAP (RAM) per Spring Boot, mileage my very based on your code base). And with 10-20 apps, refactoring to make the app lighter for Docker deployment may not be feasible/in-budget. Not to mention, if you can't run a minimal setup locally for testing (insufficient RAM), development effort will get a lot more "fun".

Docker is not a golden hammer. Give it a try, evaluate the pros and cons yourself, and decide if the pros are worth the cons for you and your team(s).

answered Oct 17 '18 at 17:02

Tezra

8,463
3
31
68

I like your answer, but at the same time it is thought provoking. What alternative would you suggest to each microservice run as a spring boot application. This allows for very loose coupling and no deployment step as in older bigger spring applications. The microservices can talk amongst themselves. So in this case, finally on the machine where the docker image is run, won't all of them use same JRE and eliminate the need for 1GB heap per container? – SamwellTarly Oct 18 '18 at 07:11
@SamwellTarly The containers will share (most) of the base image, but their runtime memory (the R+W layer and RAM) is isolated per container. So every container's JVM needs to load the resources it is using into memory (and Spring Boot uses A LOT of resources). Docker is actually based on the [12 Factor App](https://12factor.net/) design philosophy, which assumes your micro-services where all designed to run on separate VMs/machines. Although, one compromise would be to build it all on 1 Docker container at first, and then create more as you refactor for lighter deployment. – Tezra Oct 18 '18 at 12:52
@SamwellTarly The smaller your final image, and the lighter the final RAM footprint, the faster you can start the containers (which is going to be a big deal if you want to take advantage of Docker container scaling/load-balancing. Even if you use just 1 container, it solves the "works on my machine" issue (mostly). For a more targeted answer, it would be better for you to ask another question about how to solve whatever problem you are trying to solve by switching to Docker. – Tezra Oct 18 '18 at 12:58
Yes, I understand that the container including RAM usage must be minimal. However Amazon's cloud tutorial itself uses each microservice as a spring boot application. The base JVM will ask for a RAM mapping of 2GB. However each microservice uses very little RAM (10MB) on my local PC. If it needs more RAM, won't the cluster manager handle that? Can you point me to your source which states Spring boot is heavy and needs a lot of RAM in a cloud platform? – SamwellTarly Oct 19 '18 at 13:29
@SamwellTarly If Ram is not an issue, than obviously this isn't a problem. If you have a finite server resource limit, than the cluster manager cannot allocate more resources than are in the cluster. Of course, your first major issue with Java+Containers (if you aren't on 11+), is that Java will over-allocate heap from the cluster. I can't point you to hard numbers about Spring being heavy, because any blog about it does superficial tests that just prove "Spring is light on paper", but I've seen in practice Spring can add tremendous start-up and run-time overhead. (up to X5) – Tezra Oct 19 '18 at 13:57
@SamwellTarly That's not entirely Springs fault though. Like I said, this will largely boil down to how your codebase has been designed. If done "correctly" Spring boot can be used with minimal overhead, if done wrong, it will exacerbate everything that is wrong in your codebase. That's why I say try it for yourself, but don't assume Docker will solve everything for you. It can make things worse. You can tune various things to compensate, but that tuning processes isn't always in budget. Only you can decide that though. – Tezra Oct 19 '18 at 14:01

score 2 · Answer 3 · edited Oct 17 '18 at 14:33

2

Lagom's answer is great, but I'd like to add that the size of Docker containers should be as small as reasonably possible to ease transfer and storage.

Hence, there are a lot of containers based on the Alpine Linux distribution, which are really small. Try to use them if possible.

Furthermore, do not add every tool imaginable to your container, e.g. you can often do without wget...

edited Oct 17 '18 at 14:33

Peter Mortensen

30,738
21
105
131

answered Oct 17 '18 at 11:20

Christian Sauer

10,351
10
53
85

Not just `wget`, of course - I've seen production Docker images with all sorts of silly stuff inside, up to and including a full GCC distribution (in a PHP application). – Sebastian Lenartowicz Oct 17 '18 at 11:42
@SebastianLenartowicz Funny! Why? Must stuff I have seen is there for testing oder do build a python package. Most people tend not to use multi-layer images, which would prevent this particular problem. – Christian Sauer Oct 17 '18 at 11:48
Understood. So stong design with maximum inheritance needed. – SamwellTarly Oct 17 '18 at 13:13
@ChristianSauer Because the Docker images were built by people with an incomplete understanding of their purpose. They imagined they needed a whole Unix-y system inside, so they could modify and administer it while it was running (I know, I know). – Sebastian Lenartowicz Oct 17 '18 at 13:16
2

@SamwellTarly WARNING! It depends! Too much inheritance makes your whole project unwieldy. E.g. if you have multiple microservices deployed, it might be beneficial to have various jave versions - e.g. because one package has a bug which prevents it to work on the version you prefer for all other services. Strike a balance! Dev time is a consideration, too - getting alpine images to work can be a pain, if you need to install deps. – Christian Sauer Oct 17 '18 at 13:27

davidxxx · Answer 4 · 2020-02-29T09:33:33.497

Based on this, I would need 600 MB on my local system, yet need 7 GB for all Docker images.

Is this approach correct? Should "OpenJDK" from DockerHub be added to each image?

That is correct. While you could wonder if a JRE is not enough.

Why is the size of the image large even if the target PC may already have the JDK?

You compare things that are not comparable : local environment(that is all but a production machine) VS integration/production environments.

In integration/production environment, the load of your applications may be high and the isolation between applications is generally advised. So here, you want to host a minimal number of application (ui/services) by machine (bare, VM or container) to prevent side effects between application : shared libraries incompatibility, software upgrade side effects, resource starving, chained failures between applications...

While in local environment, the load of your applications is quite low and the isolation between applications is generally not a serious issue. So here you can host multiple applications (ui/services) on your local machine and you can also share some common libraries/dependencies provided by the OS. While you can do that, is really a good practice to mix and share everything in local ? I don't think because :
1) the local machine is not a bin : you work on that the whole day. More that is clean more you development is efficient. For example : JDK/JRE may differ between applications hosted in local, some folders used by the application may have the same location, the database version may differ, applications can have different installed java server (Tomcat, Netty, Weblogic) and or with different versions...
Thanks to container, that is not an issue : all is installed and removed according to your requirements.

2) environments (from local to prod) should as close as possible to ease the whole integration-deployment chain and to detect issues early and not only in production.

As a side note, to achieve that in local you need a real machine for developer.

All has a cost but actually that is not expensive

Besides isolation (hardware and software resources), containers bring other advantages as fast deploy/undeploy, scalability and failover friendly (for example : Kubernetes relies on container).
Isolation, fastness, scalability and robustness friendly have a cost: to not share physically any resource between containers (OS, libraries, JVM, ...).

That means that even if you use the exact OS, libraries, JVM in your applications, each application will have to include them in their image.
Is it expensive ? Not really : official images relies often on Alpine (light Linux OS with limitations but customizable if needed) and what represent a image of 350 MB (value that you quote is that is in the reality) in terms of cost ?
In fact, that is really cheap. In integration/production, all your services will very probably not been hosted on the same machine, so compare the 350 MB for a container to resources used in traditional VMs for integration/production that contain a complete OS with multiple additional programs installed on. You understand that the resource consumption of containers is not issue. That is even considered as an advantage beyond local environments.

Should each Docker image contain a JDK?

4 Answers4

Is this approach correct? Should "OpenJDK" from DockerHub be added to each image?

Why is the size of the image large even if the target PC may already have the JDK?

(Implied) Should each app/micro-service be in its own container?