4

I am deploying a RunCommand to an azure vm (ubuntu) using ARM (bicep).

I can see my script never completes, and this is the output from the log that the waaagent keeps:

sudo cat /var/log/azure/run-command-handler/handler.log

Writing a placeholder status file indicating progress before forking: /devopsDeploymentAgent.0.status
+ nohup /var/lib/waagent/Microsoft.CPlat.Core.RunCommandHandlerLinux-1.3.2/bin/run-command-handler enable
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event=start
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event=pre-check
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="comparing seqnum" path=devopsDeploymentAgent.mrseq
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="seqnum saved" path=devopsDeploymentAgent.mrseq
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="reading configuration from /var/lib/waagent/Microsoft.CPlat.Core.
RunCommandHandlerLinux-1.3.2/config/devopsDeploymentAgent.0.settings"
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="read configuration"
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="validating json schema"
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="json schema valid"
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="parsing configuration json"
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="parsed configuration json"
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="validating configuration logically"
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="validated configuration"
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="creating output directory" path=/var/lib/waagent/run-command-hand
ler/download/devopsDeploymentAgent/0
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="created output directory"
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 scriptUri=
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="executing command" output=/var/lib/waagent/run-command-handler/do
wnload/devopsDeploymentAgent/0
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="prepare command" scriptFile=/var/lib/waagent/run-command-handler/
download/devopsDeploymentAgent/0/script.sh
time=2023-01-09T16:12:47Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 message="Execute with TimeoutInSeconds=600"
time=2023-01-09T16:12:48Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 message="Timeout:signal: killed"
time=2023-01-09T16:12:48Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="failed to execute command" error="command terminated with exit st
atus=-1" output=/var/lib/waagent/run-command-handler/download/devopsDeploymentAgent/0
time=2023-01-09T16:12:48Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="enable script failed"
time=2023-01-09T16:12:48Z version=v1.3.2/git@6efb77e-clean operation=enable extensionName=devopsDeploymentAgent seq=0 event="failed to handle" error="failed to execute command: command termi
nated with exit status=-1"

As you can see it looks like it exits with a -1 "Timeout:signal: killed". However the timeout for this runcommand (as also seen from the logs) is 5 minutes and it starts at 2023-01-09T16:12:47Z and terminates with this exit code at 2023-01-09T16:12:48 which is 1 second.

I have included the script.sh and the stdout and stderr below.

The script itself does this:-

  • parses some arguments
  • docker login to an azure container registry
  • docker pull for an image
  • checks to see if a named docker container is already running and if so stops it, removes it
  • runs a new docker container based on the pulled image.

From the output that the script writes to stdout I can see that the script gets as far as successfully pulling the image, but then doesn't execute the next line after the docker pull - which is a simple echo statement. So it feels to me like something is actively killing the script - i.e waagent. From stderr I can see that docker login writes a warning to say a credential helper should be configured, however this isn't causing the script to terminate as it proceeds to do the docker pull of the image, and there are no further errors or warnings written.

Here is the script.sh

#!/bin/bash
while getopts u:z:o:i:t:a:l:p: flag

do
    case "${flag}" in        
          u) AZP_URL=${OPTARG};;
          z) AZP_TOKEN=${OPTARG};;
          o) AZP_POOL=${OPTARG};;
          i) AZP_AGENT_DOCKER_IMAGE=${OPTARG};;
          t) AZP_AGENT_DOCKER_IMAGE_TAG=${OPTARG};;
          a) AZP_AGENT_DOCKER_ACR_NAME=${OPTARG};;
          l) AZP_AGENT_DOCKER_ACR_USERNAME=${OPTARG};;
          p) AZP_AGENT_DOCKER_ACR_PASSWORD=${OPTARG};;        
     esac
done
## start devops agent
## pull image
## The PAT must have Agent Pools (read, manage) scope (for a deployment group agent, the PAT must have Deployment group (read, manage) scope), and while a single PAT can be used for registering multiple agents, the PAT is used only at the time of registering the agent, and not for subsequent communication.
if [ -n "$AZP_URL" ]; then
echo "docker image is $AZP_AGENT_DOCKER_IMAGE"
echo "installing az agent"
AZP_AGENT_NAME=$(hostname)
echo $AZP_AGENT_DOCKER_ACR_PASSWORD| sudo docker login $AZP_AGENT_DOCKER_ACR_NAME.azurecr.io --username $AZP_AGENT_DOCKER_ACR_USERNAME --password-stdin
# az acr login --name $AZP_AGENT_DOCKER_ACR_NAME
imagePath="$AZP_AGENT_DOCKER_ACR_NAME.azurecr.io/$AZP_AGENT_DOCKER_IMAGE:$AZP_AGENT_DOCKER_IMAGE_TAG"
dockerPullOutput=$(sudo docker pull $imagePath)
echo "finished pulling image $imagePath output was: $dockerPullOutput"
containerName="devops-agent"
# name container and stop and remove any previous one to make this idempotent
if [ ! "$(sudo docker ps -a -q -f name=$containerName)" ]; then
    sudo docker stop $containerName
    sudo docker rm $containerName       
fi

echo "running devops agent $AZP_URL $AZP_POOL $AZP_TOKEN $AZP_AGENT_NAME $imagePath"
sudo docker run -d --name $containerName --restart=always -e AZP_URL="$AZP_URL" -e AZP_POOL="$AZP_POOL" -e AZP_TOKEN="$AZP_TOKEN" -e AZP_AGENT_NAME="$AZP_AGENT_NAME" -v /var/run/docker.sock:/var/run/docker.sock $imagePath --once
else
echo "skipping install of devops agent as no devops url provided."
fi

and here is the complete stdout as you can see it only executes up to the docker pull based on this output:

sudo cat /var/lib/waagent/run-command-handler/download/devopsDeploymentAgent/0/stdout
docker image is devops/deployment-agent
installing az agent
Login Succeeded
0.1.0-alpha.14: Pulling from devops/deployment-agent
eaead16dc43b: Pulling fs layer
9e72ce0a20ef: Pulling fs layer
e3e8dc6cc33c: Pulling fs layer
b1da6338b120: Pulling fs layer
d746e1c7437c: Pulling fs layer
f9a8bd2a02f1: Pulling fs layer
264e044344ec: Pulling fs layer
2f8fea2b75d2: Pulling fs layer
ffbdacaca4cc: Pulling fs layer
3dd9baa15bcf: Pulling fs layer
891f9750edbc: Pulling fs layer
669c2f9dbfd8: Pulling fs layer
264e044344ec: Waiting
2f8fea2b75d2: Waiting
ffbdacaca4cc: Waiting
3dd9baa15bcf: Waiting
891f9750edbc: Waiting
669c2f9dbfd8: Waiting
b1da6338b120: Waiting
d746e1c7437c: Waiting
f9a8bd2a02f1: Waiting
e3e8dc6cc33c: Download complete
eaead16dc43b: Download complete
9e72ce0a20ef: Verifying Checksum
9e72ce0a20ef: Download complete
f9a8bd2a02f1: Verifying Checksum
f9a8bd2a02f1: Download complete
264e044344ec: Verifying Checksum
264e044344ec: Download complete
b1da6338b120: Verifying Checksum
b1da6338b120: Download complete
ffbdacaca4cc: Verifying Checksum
ffbdacaca4cc: Download complete
3dd9baa15bcf: Verifying Checksum
3dd9baa15bcf: Download complete
891f9750edbc: Verifying Checksum
891f9750edbc: Download complete
669c2f9dbfd8: Verifying Checksum
669c2f9dbfd8: Download complete
d746e1c7437c: Verifying Checksum
d746e1c7437c: Download complete
2f8fea2b75d2: Verifying Checksum
2f8fea2b75d2: Download complete
eaead16dc43b: Pull complete
9e72ce0a20ef: Pull complete
e3e8dc6cc33c: Pull complete
b1da6338b120: Pull complete
d746e1c7437c: Pull complete
f9a8bd2a02f1: Pull complete
264e044344ec: Pull complete
2f8fea2b75d2: Pull complete
ffbdacaca4cc: Pull complete
3dd9baa15bcf: Pull complete
891f9750edbc: Pull complete
669c2f9dbfd8: Pull complete
Digest: sha256:b2c4f630720565fab01993473d3838f5f20632b581d649ea16d7383dac11538d
Status: Downloaded newer image for labdocker.azurecr.io/devops/deployment-agent:0.1.0-alpha.14
labdocker.azurecr.io/devops/deployment-agent:0.1.0-alpha.14

and stderr:

sudo cat /var/lib/waagent/run-command-handler/download/devopsDeploymentAgent/0/stderr
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

This is not the first time I have seen a run command script seemingly exits for no reason - it happened with another where I was calling sudo gluster volume create commands. Also worth mentioning I am executing multiple separate run commands, one after the other as part of the same deployment. I am being very sure to only deploy one Run Command at a time as I wasn't sure the waagent could deal with concurrency.

For this example, here are the deployment activities shown in the portal - each activity (marked in red) corresponds to a bicep template file that depoys one or more run commands accross a cluster of vm's - one per vm. Each activity runs it's commands accross the seperate vm's, and finishes, before the next one starts. I can't see anything from the times of these activities or how its written that would mean multiple run commands are being executed on a single vm at the same time, so this looks ok to me.

portal deployment

Note: If I run the same exact deployment a second time, this time the script finishes. The difference is that the scripts, when run a second time, have already done lots of work the first time round, and in the case of this script it has already pulled the image, so this time round the pull completes very fast and it then hasn't terminated and successfully runs the docker container.

My question is, what is causing the script to terminate?

Darrell
  • 1,905
  • 23
  • 31

0 Answers0