2

I am building a redundant Schema Registry hosted in Amazon for our MSK Kafka Cluster by using an ECS cluster.

The SchemaRegistry TaskDefinition needs to define a hostname which is unique to each Task when running.

SchemaRegistryTaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: !Ref SchemaRegistryTaskName
      RequiresCompatibilities: [ EC2 ]
      NetworkMode: bridge
      Cpu: !Ref CPUReservation
      Memory: !Ref MemoryReservation
      Volumes: []
      ContainerDefinitions:
        - Name: !Ref SchemaRegistryTaskName
          Image: !Ref SchemaRegistryTaskImage
          Essential: true
          PortMappings:
            - ContainerPort: !Ref SchemaRegistryPort
              HostPort: 0 # Randomly assigned port from the ephemeral port range.
          Environment:
            - Name: AWS_DEFAULT_REGION
              Value: !Ref AWS::Region
            - Name: SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS
              Value: !Ref MskBrokerUrls
            - Name: SCHEMA_REGISTRY_HOST_NAME
              Value: $HOSTNAME
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: !Ref 'CloudwatchLogsGroup'
              awslogs-region: !Ref 'AWS::Region'

NB: Using $Hostname works when running the docker container directly in an EC2 instance via the cli because shell substitutes in the fully qualified hostname which is unique; but I am stumped trying to figure out how to make this work within ECS & CloudFormation.

Syntax
  • 2,155
  • 2
  • 23
  • 34

4 Answers4

5

What I did is to have entrypoint script in Docker image, which will do a look up from ECS meta, and expose values as environment SCHEMA_REGISTRY_HOST_NAME. Please find below sample script.

#!/bin/sh

#########
# Detect whether this is running in an ECS cluster
#########
curl --max-time 1 -s --fail -o /dev/null http://169.254.169.254/
if [[ 0 -eq $? ]]; then
    echo "AWS environment was detected - looking up HOST IP from metadata"
    SCHEMA_REGISTRY_HOST_NAME=$(curl http://169.254.169.254/latest/meta-data/local-ipv4 -s)
    export SCHEMA_REGISTRY_HOST_NAME
else
    echo "Not running in AWS environment. Will not set SCHEMA_REGISTRY_HOST_NAME"
fi

You can also take a look at this one on how to do local testing/development as well.

sayboras
  • 4,897
  • 2
  • 22
  • 40
  • Thanks for the suggestion, are you able to clarify what service is running at 169.254.169.254? Is this some custom service you host which wraps an Amazon API that does a lookup? – Syntax Nov 11 '19 at 03:03
  • 2
    it's the metadata endpoint provided by AWS ECS https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html – sayboras Nov 11 '19 at 03:06
5

In the end I went with a custom Command and EntryPoint on the TaskDefinition ContainerDefinitions; using the metadata endpoint suggested by @Apolozeus:

EntryPoint: ["/bin/bash"]
Command: ["-c","(export SCHEMA_REGISTRY_HOST_NAME=$(wget -qO- 169.254.169.254/latest/meta-data/local-ipv4);/etc/confluent/docker/run)"]

This ensures the environment variable for SCHEMA_REGISTRY_HOST_NAME exists on the container and correctly maps to ipv4 of the EC2 instance the container is running on.

This is preferable to me, because we don't control the Docker container being run (it is publicly available), and I don't want to wrap it with a Docker container that we then have to maintain.

Syntax
  • 2,155
  • 2
  • 23
  • 34
  • were there any problems with non-masters forwarding write requests to the master? The advertised port of 8081 by default wouldn't be exposed as ECS would map it to something else. – PragmaticProgrammer Feb 02 '21 at 16:04
  • 1
    Sorry, we stopped self hosting schema registry in AWS and use a third party provider instead; good luck! – Syntax Feb 03 '21 at 12:29
3

Adding on to Syntax's answer, to support Schema Registry in a cluster configuration (running more than 1 instance) you have to correctly configure the listeners to use the right port. To avoid hardcoding ports one can query the ECS metadata service and weave that into the Schema Registry config as follows:

EntryPoint: ["/bin/bash"]
Command: ["-c","(
   export SCHEMA_REGISTRY_HOST_NAME=$(curl 169.254.169.254/latest/meta-data/local-ipv4);
   curl $ECS_CONTAINER_METADATA_URI_V4 > ecs.json;
   export HOST_PORT=$(python -c \"import json; f = open('ecs.json').read(); data = json.loads(f); print(data['Ports'][0]['HostPort'])\");
   export SCHEMA_REGISTRY_LISTENERS=\"http://0.0.0.0:$HOST_PORT,http://0.0.0.0:8081\";
   /etc/confluent/docker/run)"]

The extra listener config is needed as the $HOST_PORT is the actual port that will be advertised to other instances of Schema Registry for forwarding write operations to the master.

If using the accepted answer, your replicas would fail to do so as they would try to send traffic to 8081 by default, which most likely will not be the port that will be exposed by ECS.

This worked for ECS agent 1.50 and Confluent Schema Registry's 5.5.3 Docker Image.

PragmaticProgrammer
  • 1,079
  • 13
  • 19
0

Here is solution if running on ECS Fargate based on one of the answers above. It depends on having jq so you may need to install that. It is also hardcoded to the first network, which may not fit your use case.


#########
# Detect whether this is running in an ECS cluster
#########
curl --max-time 1 -s --fail -o /dev/null ${ECS_CONTAINER_METADATA_URI_V4}

if [[ 0 -eq $? ]]; then
    echo "AWS environment was detected - looking up HOST IP from metadata"
    curl ${ECS_CONTAINER_METADATA_URI_V4} > meta.json
    SCHEMA_REGISTRY_HOST_NAME=$(jq '.Networks[0].IPv4Addresses[0]' meta.json -r) 
    export SCHEMA_REGISTRY_HOST_NAME
else
    echo "Not running in AWS environment. Will not set SCHEMA_REGISTRY_HOST_NAME"
fi
Myles McDonnell
  • 12,943
  • 17
  • 66
  • 116