Our application requires the use of HAProxy to load balance and route traffic (one per AZ), ALBs and ELBs are not configurable enough for our purposes. When deploying new code via AWS CodeDeploy, we would like the instances being patched to be placed into Maintenance Mode (removed from load balancing, connections drained). We have modified the default CodeDeploy lifecycle bash scripts to remove the instances from their respective HAProxy instances by sending an SSM Run Command to HAProxy from the instances in question. Currently this modification doesn't work, and the reason for failure is unknown. The script works when executed manually step by step (at least to the current point of failure). The part that fails is either the test that returns "$INSTANCE_ID doesn't seem to be in an AZ with a HAProxy instance, skipping deregistration.", or the setting of $HAPROXY_ID which aforementioned test depends on. The script runs just fine up until that point, but at that point, exits because it can't find the HAProxy instance ID.
I have checked IAM role permissions/credentials, environment variables, and file permissions which all appear to be correct. Normally I would place more logging into the script to debug, but deployments are too few and far between for us to make that practical.
My question: Is there a better way to do this? I can only guess we're not the only ones to use HAProxy with CodeDeploy, and there has to be a reliable method of doing this. Below is the current code being used that is not working.
#!/bin/bash
#
# Copyright 2014 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://aws.amazon.com/apache2.0
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.
. $(dirname $0)/common_functions.sh
if [[ "$DEPLOYMENT_GROUP_NAME" != "redacted" ]]; then
msg "ELB Deregistration doesn't need to happen when not on redacted."
exit
fi
msg "Running AWS CLI with region: $(get_instance_region)"
# get this instance's ID
INSTANCE_ID=$(get_instance_id)
if [ $? != 0 -o -z "$INSTANCE_ID" ]; then
error_exit "Unable to get this instance's ID; cannot continue."
fi
# Get current time
msg "Started $(basename $0) at $(/bin/date "+%F %T")"
start_sec=$(/bin/date +%s.%N)
msg "Checking if instance $INSTANCE_ID is part of an AutoScaling group"
asg=$(autoscaling_group_name $INSTANCE_ID)
if [ $? == 0 -a -n "${asg}" ]; then
msg "Found AutoScaling group for instance $INSTANCE_ID: ${asg}"
msg "Checking that installed CLI version is at least at version required for AutoScaling Standby"
check_cli_version
if [ $? != 0 ]; then
error_exit "CLI must be at least version ${MIN_CLI_X}.${MIN_CLI_Y}.${MIN_CLI_Z} to work with AutoScaling Standby"
fi
msg "Attempting to put instance into Standby"
autoscaling_enter_standby $INSTANCE_ID "${asg}"
if [ $? != 0 ]; then
error_exit "Failed to move instance into standby"
else
msg "Instance is in standby"
fi
fi
msg "Instance is not part of an ASG, continuing..."
## Get the instanceID of the HAProxy instance in this AZ and ENVIRONMENT - Will there ever be more than one???
HAPROXY_ID=$(/usr/local/bin/aws ec2 describe-instances --region us-east-1 --filters "Name=availability-zone,Values=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)" "Name=tag:deployment_group,Values=haproxy.$ENVIRONMENT" --output text | \
grep INSTANCES | \
awk '{print $8}' )
HAPROXY_IP=$(/usr/local/bin/aws ec2 describe-instances --region us-east-1 --filters "Name=availability-zone,Values=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)" "Name=tag:deployment_group,Values=haproxy.$ENVIRONMENT" --output text | \
grep INSTANCES | \
awk '{print $13}' )
if test -z "$HAPROXY_ID"; then
msg "$INSTANCE_ID doesn't seem to be in an AZ with a HAProxy instance, skipping deregistration."
exit
fi
## Put the current instance into MAINT mode with the HAProxy instance via SSM
msg "Deregistering $INSTANCE_ID from HAProxy $HAPROXY_ID"
DEREGCMD="{\"commands\":[\"haproxyctl disable server bk_app_servers/$INSTANCEID\"],\"executionTimeout\":[\"3600\"]}"
/usr/local/bin/aws ssm send-command \
--document-name "AWS-RunShellScript" \
--instance-ids "$HAPROXY_ID" \
--parameters "$DEREGCMD" \
--timeout-seconds 600 \
--output-s3-bucket-name "redacted" \
--output-s3-key-prefix "haproxy-codedeploy/deregister" \
--region us-east-1
if [ $? != 0 ]; then
error_exit "Failed to send SSM command to deregister instance $INSTANCE_ID from HAProxy $HAPROXY_ID"
fi
## Wait for all connections to drain from instance
SESS_COUNT=$(/usr/bin/curl -s "http://$HAPROXY_IP:<portredacted>/<urlredacted>" | grep $INSTANCEID | awk -F "," '{print $5}')
DRAIN_TIME=60
msg "Initial session count: $SESS_COUNT"
while [[ "$SESS_COUNT" -gt 0 ]]; do
if [[ "$COUNTER" -gt "$DRAIN_TIME" ]]; then
msg "Instance failed to drain all connections within $DRAIN_TIME seconds. Continuing to deploy anyway."
break
fi
msg $SESS_COUNT
sleep 1
COUNTER=$(($COUNTER + 1))
SESS_COUNT=$(/usr/bin/curl -s "http://$HAPROXY_IP:<portredacted>/<urlredacted>" | grep $INSTANCEID | awk -F "," '{print $5}')
done
msg "Finished $(basename $0) at $(/bin/date "+%F %T")"
end_sec=$(/bin/date +%s.%N)
elapsed_seconds=$(echo "$end_sec - $start_sec" | /usr/bin/bc)
msg "Elapsed time: $elapsed_seconds"