0

Our application requires the use of HAProxy to load balance and route traffic (one per AZ), ALBs and ELBs are not configurable enough for our purposes. When deploying new code via AWS CodeDeploy, we would like the instances being patched to be placed into Maintenance Mode (removed from load balancing, connections drained). We have modified the default CodeDeploy lifecycle bash scripts to remove the instances from their respective HAProxy instances by sending an SSM Run Command to HAProxy from the instances in question. Currently this modification doesn't work, and the reason for failure is unknown. The script works when executed manually step by step (at least to the current point of failure). The part that fails is either the test that returns "$INSTANCE_ID doesn't seem to be in an AZ with a HAProxy instance, skipping deregistration.", or the setting of $HAPROXY_ID which aforementioned test depends on. The script runs just fine up until that point, but at that point, exits because it can't find the HAProxy instance ID.

I have checked IAM role permissions/credentials, environment variables, and file permissions which all appear to be correct. Normally I would place more logging into the script to debug, but deployments are too few and far between for us to make that practical.

My question: Is there a better way to do this? I can only guess we're not the only ones to use HAProxy with CodeDeploy, and there has to be a reliable method of doing this. Below is the current code being used that is not working.

#!/bin/bash
#
# Copyright 2014 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
#  http://aws.amazon.com/apache2.0
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.

. $(dirname $0)/common_functions.sh

if [[ "$DEPLOYMENT_GROUP_NAME" != "redacted" ]]; then
  msg "ELB Deregistration doesn't need to happen when not on redacted."
  exit
fi

msg "Running AWS CLI with region: $(get_instance_region)"

# get this instance's ID
INSTANCE_ID=$(get_instance_id)
if [ $? != 0 -o -z "$INSTANCE_ID" ]; then
    error_exit "Unable to get this instance's ID; cannot continue."
fi

# Get current time
msg "Started $(basename $0) at $(/bin/date "+%F %T")"
start_sec=$(/bin/date +%s.%N)

msg "Checking if instance $INSTANCE_ID is part of an AutoScaling group"
asg=$(autoscaling_group_name $INSTANCE_ID)
if [ $? == 0 -a -n "${asg}" ]; then
    msg "Found AutoScaling group for instance $INSTANCE_ID: ${asg}"

    msg "Checking that installed CLI version is at least at version required for AutoScaling Standby"
    check_cli_version
    if [ $? != 0 ]; then
        error_exit "CLI must be at least version ${MIN_CLI_X}.${MIN_CLI_Y}.${MIN_CLI_Z} to work with AutoScaling Standby"
    fi

    msg "Attempting to put instance into Standby"
    autoscaling_enter_standby $INSTANCE_ID "${asg}"
    if [ $? != 0 ]; then
        error_exit "Failed to move instance into standby"
    else
        msg "Instance is in standby"
    fi
fi

msg "Instance is not part of an ASG, continuing..."

## Get the instanceID of the HAProxy instance in this AZ and ENVIRONMENT - Will there ever be more than one???

HAPROXY_ID=$(/usr/local/bin/aws ec2 describe-instances --region us-east-1 --filters "Name=availability-zone,Values=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)" "Name=tag:deployment_group,Values=haproxy.$ENVIRONMENT" --output text  | \
grep INSTANCES | \
awk '{print $8}' )

HAPROXY_IP=$(/usr/local/bin/aws ec2 describe-instances --region us-east-1 --filters "Name=availability-zone,Values=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)" "Name=tag:deployment_group,Values=haproxy.$ENVIRONMENT" --output text  | \
grep INSTANCES | \
awk '{print $13}' )

if test -z "$HAPROXY_ID"; then
    msg "$INSTANCE_ID doesn't seem to be in an AZ with a HAProxy instance, skipping deregistration."
    exit
fi

## Put the current instance into MAINT mode with the HAProxy instance via SSM

msg "Deregistering $INSTANCE_ID from HAProxy $HAPROXY_ID"

DEREGCMD="{\"commands\":[\"haproxyctl disable server bk_app_servers/$INSTANCEID\"],\"executionTimeout\":[\"3600\"]}"

/usr/local/bin/aws ssm send-command \
--document-name "AWS-RunShellScript" \
--instance-ids "$HAPROXY_ID" \
--parameters "$DEREGCMD" \
--timeout-seconds 600 \
--output-s3-bucket-name "redacted" \
--output-s3-key-prefix "haproxy-codedeploy/deregister" \
--region us-east-1

if [ $? != 0 ]; then
    error_exit "Failed to send SSM command to deregister instance $INSTANCE_ID from HAProxy $HAPROXY_ID"
fi

## Wait for all connections to drain from instance

SESS_COUNT=$(/usr/bin/curl -s "http://$HAPROXY_IP:<portredacted>/<urlredacted>" | grep $INSTANCEID | awk -F "," '{print $5}')
DRAIN_TIME=60

msg "Initial session count: $SESS_COUNT"

while [[ "$SESS_COUNT" -gt 0 ]]; do
    if [[ "$COUNTER" -gt "$DRAIN_TIME" ]]; then
        msg "Instance failed to drain all connections within $DRAIN_TIME seconds. Continuing to deploy anyway."
        break
    fi
    msg $SESS_COUNT
    sleep 1
    COUNTER=$(($COUNTER + 1))
    SESS_COUNT=$(/usr/bin/curl -s "http://$HAPROXY_IP:<portredacted>/<urlredacted>" | grep $INSTANCEID | awk -F "," '{print $5}')
done

msg "Finished $(basename $0) at $(/bin/date "+%F %T")"

end_sec=$(/bin/date +%s.%N)
elapsed_seconds=$(echo "$end_sec - $start_sec" | /usr/bin/bc)

msg "Elapsed time: $elapsed_seconds"
Andrew
  • 3
  • 2

1 Answers1

0

At the moment the only option for you is to add more logging and issue a deployment to test out this script and then look at your deployment logs. It sounds like you don't know why it's failing and only the logs can tell you that.

Try adding logging and seeing what happens. We should be just executing your script as is so it shouldn't act any differently but it's hard to tell without seeing the logs.

Good luck, -Asaf