11

I have a large Java application I'm trying to run on a fargate cluster in AWS. The image runs successfully on my local machine's docker. When I run it in fargate it starts successfully, but eventually encounters the following error after which the application gets stuck:

! java.net.UnknownHostException: 690bd678bcf4: 690bd678bcf4: Name or service not known
! at java.net.InetAddress.getLocalHost(InetAddress.java:1505) ~[na:1.8.0_151]
! at tracelink.misc.SingletonTokenDBO$.<init>(SingletonTokenDBO.scala:34) ~[habari.jar:8.4-QUARTZ-SNAPSHOT]
! at tracelink.misc.SingletonTokenDBO$.<clinit>(SingletonTokenDBO.scala) ~[habari.jar:8.4-QUARTZ-SNAPSHOT]
!... 10 common frames omitted
Caused by: ! java.net.UnknownHostException: 690bd678bcf4: Name or service not known
! at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[na:1.8.0_151]
! at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) ~[na:1.8.0_151]
! at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) ~[na:1.8.0_151]
! at java.net.InetAddress.getLocalHost(InetAddress.java:1500) ~[na:1.8.0_151]
!... 12 common frames omitted

The offending line of Scala code is:

  private val machineName = InetAddress.getLocalHost().getHostName()

Some initial research suggests the error is related to the contents of the /etc/hosts file in the container. So I created a small test program that exhibits the same behavior as my real application, and also dumps the contents of /etc/hosts to stdout:

import java.net.*;
import java.io.*;

public class NetworkTest {
   public static void main(String[] args) throws InterruptedException, IOException, FileNotFoundException {
      while(true) {
         networkDump();
         Thread.sleep(10000);
      }
   }

   private static void networkDump() throws IOException, FileNotFoundException {
      System.out.println("/etc/hosts:");
      System.out.println("");

      FileReader f = new FileReader("/etc/hosts");
      BufferedReader reader = new BufferedReader(f);
      String line = null;
      while((line = reader.readLine()) != null) {
         System.out.println(line);
      }
      System.out.println("");

      dumpHostname();
   }

   private static void dumpHostname() {
      try {
         String hostname = InetAddress.getLocalHost().getHostName();
         System.out.printf("Hostname: %s\n\n", hostname);
      } catch(UnknownHostException e) {
         System.out.println(e.getMessage());
      }
   }
}

Dockerfile:

FROM openjdk:8

WORKDIR /site
ADD . /site

CMD ["java", "NetworkTest"]

The output I get from this in AWS looks like:

/etc/hosts:
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

3a5a4271a6e3: 3a5a4271a6e3: Name or service not known

Compared with this output running in docker on my local machine:

> docker run networktest

/etc/hosts:
127.0.0.1   localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.4  82691e2fb948

Hostname: 82691e2fb948

The local version that does not get the exception has an entry in /etc/hosts for the hostname, while the AWS hosts file has no entry for the hostname. I've tried adding an /etc/rc.local file to manually add the hostname to the end of the localhost line, and just adding a RUN command in the Dockerfile to do the same thing. Neither has had any effect.

Does anyone know if there's a way to configure either the image or the ECS task definition to get the hostname properly configured in AWS?

Daniel McHenry
  • 305
  • 2
  • 9

5 Answers5

5

Pointing the hostname to 127.0.0.1 by going:

echo "127.0.0.1 $HOSTNAME" >> /etc/hosts

Fixed the issue for me.

I'm using Docker Compose. So I have a docker-compose.yml file like this:

version: '2'

services:
  myservice:
    command: ["/set-hostname.sh", "--", "/run-service.sh"]

and then the set-hostname.sh file looks like this:

#!/bin/bash

set -e

shift
cmd="$@"

echo "127.0.0.1 $HOSTNAME" >> /etc/hosts

exec $cmd
djones
  • 1,369
  • 10
  • 15
  • How did you run that command? As part of the docker file, logging in to the instance after it started, something else entirely? – Daniel McHenry Mar 09 '18 at 09:21
3

Exactly the same issue I was struggling with for a long time. This solution worked for me:

ENTRYPOINT ["/bin/sh", "-c" , "echo 127.0.0.1 $HOSTNAME >> /etc/hosts && exec mvn spring-boot:run"]
Paul Roub
  • 36,322
  • 27
  • 84
  • 93
0

So, I came across exactly the same issue and the thing is that as you already mentioned the hostname doesn't make much of a sense. The only way to fetch the actual instance IP which can be seen within the VPC is to use the AWS task metadata API which in my case I did. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint.html

I have wired the following code to fetch the localhost IP:

try {
            final ResponseEntity<String> taskInfoResponse = this.restTemplate.getForEntity("http://169.254.170.2/v2/metadata", String.class);
            log.info("Got AWS task info: {}", taskInfoResponse);
            log.info("Got AWS task info: {}", taskInfoResponse.getBody());
            if (taskInfoResponse.getStatusCode() == HttpStatus.OK) {
                try {
                    final ObjectNode jsonNodes = this.objectMapper.readValue(taskInfoResponse.getBody(), ObjectNode.class);
                    final JsonNode jsonNode = jsonNodes.get("Containers")
                            .get(0).get("Networks")
                            .get(0)
                            .get("IPv4Addresses").get(0);
                    log.info("Got IP to use: {}", jsonNode);
                    if (jsonNode != null) {
                        awsTaskInfo.setTaskAddress(InetAddress.getByName(jsonNode.asText()));
                    }
                } catch (IOException e) {
                    throw new IllegalArgumentException(e);
                }
            } else {
                awsTaskInfo.setTaskAddress(InetAddress.getLoopbackAddress());
            }
        }catch (ResourceAccessException e){
            log.error("Failed to fetch AWS info", e);
            awsTaskInfo.setTaskAddress(InetAddress.getLoopbackAddress());
        }
user3485142
  • 151
  • 1
  • 6
0

I faced the same issue while trying to access S3 and SQS from a Lambda. The solution was not to specify region while creating client instances, so instead of:

SqsAsyncClient.builder()
                .region(Region.of(region))
                .build();

Do this:

SqsAsyncClient.create();
Leonid Bor
  • 2,064
  • 6
  • 27
  • 47
0

Enabling the "DNS Hostnames" option in the VPC the task uses resolves this issue for me.

David Ha
  • 187
  • 1
  • 1
  • 10