2

I want to implement a shutdown-script that would be called when my VM is going to be preempted on Google Compute Engine. That VM is used to run dockers containers that execute long running batches, so I send them a signal to make them gracefully exit.

That shutting-down script is working well when I execute it manually, yet it breaks on a real premption use-case, or when I kill the VM by myself.

I got this error:

... logs from my containers ...

A 2019-08-13T16:54:07.943153098Z time="2019-08-13T16:54:07Z" level=error msg="error waiting for container: unexpected EOF" 

(just after this error, I can see what I put in the 1st line of my shutting-down script, see code below)

A 2019-08-13T16:54:08.093815210Z 2019-08-13 16:54:08: Shutting down!  TEST SIGTERM SHUTTING DOWN (this is the 1st line of my shuttig-down script)
A 2019-08-13T16:54:08.093845375Z docker ps -a 
(no reult)
A 2019-08-13T16:54:08.155512145Z ps -ef 
... a lot of things, but nothing related to docker ...

2019-08-13 16:54:08: task_subscriber not running, shutting down immediately.

I use preemptible VM from GCE, with image Container-Optimized OS 73-11647.267.0 stable. I run my dockers as service with systemctl, yet I don't thik this is related - [edit] Actually I could solve my issue thanks to this.

Right now, I am pretty sure that a lot of things happens when Google send the ACPI signal to my VM, even before the shutdown-script is fetched from the VM metadata and is called.

My guess is that all the services are being stopped at the same time, eventually docker.service itself.

When my container is running, I can get the same level=error msg="error waiting for container: unexpected EOF" with a simple sudo systemctl stop docker.service

Here is a part of my shuting-down script:


#!/bin/bash
# This script must be added in the VM metadata as "shutdown-script" so that
# it is executed when the instance is being preempted.


CONTAINER_NAME="task_subscriber" # For example, "task_subscriber"

logTime() {
    local datetime="$(date +"%Y-%m-%d %T")"
    echo -e "$datetime: $1" # Console
    echo -e "$datetime: $1" >>/var/log/containers/$CONTAINER_NAME.log
}



logTime "Shutting down!  TEST SIGTERM SHUTTING DOWN"

echo "docker ps -a" >>/var/log/containers/$CONTAINER_NAME.log
docker ps -a >>/var/log/containers/$CONTAINER_NAME.log

echo "ps -ef" >>/var/log/containers/$CONTAINER_NAME.log
ps -ef >>/var/log/containers/$CONTAINER_NAME.log

if [[ ! "$(docker ps -q -f name=${CONTAINER_NAME})" ]]; then
    logTime "${CONTAINER_NAME} not running, shutting down immediately."
    sleep 10 # Give time to send logs
    exit 0
fi

logTime "Sending SIGTERM to ${CONTAINER_NAME}"
#docker kill --signal=SIGTERM ${CONTAINER_NAME}
systemctl stop taskexecutor.service

# Portable waitpid equivalent
while [[ "$(docker ps -q -f name=${CONTAINER_NAME})" ]]; do
    sleep 1
    logTime "Waiting for ${CONTAINER_NAME} termination"
done

logTime "${CONTAINER_NAME} is done, shutting down."
logTime "TEST SIGTERM SHUTTING DOWN BYE BYE"
sleep 10 # Give time to send logs

If I simply call systemctl stop taskexecutor.service manually (not by really shutting down the server), the SIGTERM signal is sent to my docker and my app properly handle it and exists.

Any idea?

-- How I solved my issue --

I could solve it by adding this dependency on docker in my service config:

[Unit]
Wants=gcr-online.target docker.service
After=gcr-online.target docker.service

I don't know how the magic works beyond the execution of the shutdown-script stored in metadata by Google. But I think that they should fix something in their Container-Optimized OS so that that magic happens before docker is stopped. Otherwise, we could not rely on it to gracefully shutdown a basic script with it (hopefully I was using systemd here)...

gjanvier
  • 53
  • 4

1 Answers1

0

From the documentation[1] usage of shutdown scripts on the preemptible VM instances are feasible. However, it seems there are some limitations in place while using the shutdown scripts, Compute Engine executes shutdown scripts only on a best-effort basis. In rare cases, Compute Engine cannot guarantee that the shutdown script will complete. Also I would like to mention Preemptible instances has 30 seconds after instance preemption begins[2] which might be killing the docker before the shutdown script was executed. From the error message provided in your use case, it seems to be an expected behaviour with the Docker running continuously for longer time[3].

[1]https://cloud.google.com/compute/docs/instances/create-start-preemptible-instance#handle_preemption [2] https://cloud.google.com/compute/docs/shutdownscript#limitations [3] https://github.com/docker/for-mac/issues/1941

Gautham
  • 86
  • 1
  • 4