I have a compose file with three services (database, backend and frontend). Backend depends on database being healthy, and frontend depends on backend being healthy.
Database (postgres) checks for its own health using pg_isready
and backend (FastAPI) checks for its health via an endpoint http://localhost:8080/healthcheck
Compose file:
version: '3'
services:
database:
image: postgres:14-alpine
healthcheck:
test: pg_isready -U postgres
interval: 1s
timeout: 5s
retries: 5
start_period: 10s
backend:
depends_on:
database:
condition: service_healthy
image: backend-api-image
build:
context: backend
dockerfile: Dockerfile
ports:
- "8080:8080"
volumes:
- './backend:/backend'
healthcheck:
test: wget --no-verbose --tries=1 --spider http://localhost:8080/healthcheck || exit 1
interval: 1s
timeout: 5s
frontend:
image: my-frontend
depends_on:
backend:
condition: service_healthy
build:
context: ./frontend
dockerfile: Dockerfile
FastAPI app
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.get('/healthcheck')
def get_healthcheck():
return 'OK'
So far this all works as expected. If, for example I were to have a typo in my healthcheck
endpoint route (in my app), startup would fail, like so:
database | 2023-06-01 23:01:44.410 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
database | 2023-06-01 23:01:44.410 UTC [1] LOG: listening on IPv6 address "::", port 5432
database | 2023-06-01 23:01:44.411 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database | 2023-06-01 23:01:44.414 UTC [22] LOG: database system was shut down at 2023-06-01 22:51:10 UTC
database | 2023-06-01 23:01:44.417 UTC [1] LOG: database system is ready to accept connections
backend | INFO: Will watch for changes in these directories: ['/backend']
backend | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend | INFO: Started reloader process [1] using StatReload
backend | INFO: Started server process [8]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:41294 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:41296 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:41298 - "GET /healthcheck HTTP/1.1" 404 Not Found
dependency failed to start: container backend is unhealthy
Where I'm getting confused is, that after a successful startup, if I change the app in such a way to make backend
become unhealthy, the container would detect the change and the check would return a 404
(as expected) but it would never become unhealthy.
database | 2023-06-01 23:06:37.396 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
database | 2023-06-01 23:06:37.396 UTC [1] LOG: listening on IPv6 address "::", port 5432
database | 2023-06-01 23:06:37.397 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database | 2023-06-01 23:06:37.400 UTC [22] LOG: database system was shut down at 2023-06-01 23:06:34 UTC
database | 2023-06-01 23:06:37.403 UTC [1] LOG: database system is ready to accept connections
backend | INFO: Will watch for changes in these directories: ['/backend']
backend | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend | INFO: Started reloader process [1] using StatReload
backend | INFO: Started server process [9]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:49450 - "GET /healthcheck HTTP/1.1" 200 OK
frontend |
frontend | > frontend@0.0.0 dev
frontend | > vite --host
frontend |
frontend | Forced re-optimization of dependencies
frontend |
frontend | VITE v4.3.1 ready in 285 ms
frontend |
frontend | ➜ Local: http://localhost:5173/
frontend | ➜ Network: http://172.26.0.4:5173/
backend | INFO: 127.0.0.1:57966 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:57968 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:57982 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:57992 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58002 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58012 - "GET /healthcheck HTTP/1.1" 200 OK
backend | INFO: 127.0.0.1:58018 - "GET /healthcheck HTTP/1.1" 200 OK
backend | WARNING: StatReload detected changes in 'src/main.py'. Reloading...
backend | INFO: Shutting down
backend | INFO: Waiting for application shutdown.
backend | INFO: Application shutdown complete.
backend | INFO: Finished server process [9]
backend | INFO: Started server process [76]
backend | INFO: Waiting for application startup.
backend | INFO: Application startup complete.
backend | INFO: 127.0.0.1:58028 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:58040 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35092 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35098 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35102 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35116 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35126 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend | INFO: 127.0.0.1:35134 - "GET /healthcheck HTTP/1.1" 404 Not Found
What I expected:
While running after a successful startup, upon changing the backend
code in such a way that its healthcheck would fail, I expected frontend
to exit or become degraded somehow, as its health dependency has failed.
What happened:
Everything kept running as if nothing happened, even though the backend
healthcheck returned a failing value.
My questions:
- Is the healthcheck only valid during startup to wait for a container to be "ready"? Documentation seems to suggest so.
- If so, then why keep checking for health after successful startup?
- If not, why is the
backend
container not being marked as unhealthy when changes cause its healthcheck to fail while running? - Is there a way to degrade a container to unhealthy while running after a successful startup?
- I'm aware that I can use
kill 1
instead ofexit 1
and that would causebackend
container to stop, but doesn't seem very clean.