2

From within a docker container (running ubuntu 18) running on AWS ECS I'm attempting to establish a connection to an outside data center. We've troubleshot the issue to where we believe it's the extra hop added by the local docker network that is causing the failure. This is supported by the fact that a curl request to the destination IP completes successfully from the docker host EC2 instance, as well as from inside the same docker container when deployed to a subnet that is less than 33 hops from the destination IP.

When running traceroute <destination_ip> from within the container I see 33 hops:

root@1cfbdf43c8f5:~# traceroute -m36 <destination_ip>
traceroute to <destination_ip> (<destination_ip>), 36 hops max, 60 byte packets
 1  ip-172-17-0-1.us-east-2.compute.internal (172.17.0.1)  0.039 ms  0.014 ms  0.013 ms
 2  ip-10-133-216-197.us-east-2.compute.internal (10.133.216.197)  1.185 ms  1.146 ms  1.107 ms
 3  ec2-52-15-0-157.us-east-2.compute.amazonaws.com (52.15.0.157)  8.188 ms ec2-52-15-0-169.us-east-2.compute.amazonaws.com (52.15.0.169)  5.615 ms ec2-52-15-0-161.us-east-2.compute.amazonaws.com (52.15.0.161)  10.227 ms
 ...
32  <destination_ip>  24.706 ms  24.584 ms  24.698 ms
33  <destination_ip>  24.411 ms  24.426 ms  24.323 ms

The first hop is docker, second is the AWS NAT gateway before winding its way through the AWS networks and finally arriving at at hop 33.

When running curl <destination_address> while capturing with tcpdump -v host <destination_ip> on the EC2 host machine running docker, I see that the request fails due to ttl:

ip-10-133-218-86.us-east-2.compute.internal > <destination_ip>: ICMP time exceeded in-transit, length 52

However, the inspection of that same tcpdump shows the request has a TTL of 63 as it passes through the host, indicating it is correctly using the ubuntu system default of 64:

Time to live: 63

My question is: what may cause a request being sent with a TTL of 64 to fail on connection to a destination IP that traceroute shows is only 33 away?

It seems our options at this point are to (1) decrease the number of hops between source and destination, or else (2) increase the TTL of the outgoing request.

In an attempt to do (2), increase the TTL, I've tried updating the sys property /proc/sys/net/ipv4/ip_default_ttl=64 to /proc/sys/net/ipv4/ip_default_ttl=128. tcpdump inspection shows this is being respected in the outgoing request, however the call still fails with ICMP time exceeded in-transit.

Edit 1

Adding Wireshark screengrab from tcpdump on host machine. tcpdump from ec2 host machine

Edit 2

Adding another tcpdump, captured while curling that same host but from my local machine. enter image description here

As the answer points out, the [SYN,ACK] response has a TTL that is too low to reach back to the machine initiating the request. In the image of me hitting that same server locally, you can see it is about 200 hops fewer than any other response by that server.

cmikeb1
  • 123
  • 4
  • Would you mind adding the output of "netstat -rn", from the host and from the container? I prefer text rather than images. – Gerard H. Pille May 22 '20 at 08:03
  • 2
    Talking of images, your wireshark pic shows that the syn,ack you get back only has a time to live of "1", to which your system reacts with "TTL exceeded", obviously. Have you got something messing with the incoming TTL, eg. iptables? – Gerard H. Pille May 22 '20 at 08:15
  • 1
    Another explanation would be that after the syn-ack, for the rest of the session, they set their TTL to 255. I have a question about your traceroute: what is it with the third line? – Gerard H. Pille May 22 '20 at 20:37
  • `3 ec2-52-15-0-157.us-east-2.compute.amazonaws.com...`. What do you find odd about it? I believe that hop is outside my VPC and is representing some kind of load balancing done by AWS. – cmikeb1 May 22 '20 at 21:29
  • 1
    Something I've never seen before on a traceroute, which proves one's never too old to learn. – Gerard H. Pille May 22 '20 at 21:34

1 Answers1

2

It's the responses that have a TTL of only 1 when arriving on the host, preventing them being routed to the container.

Gerard H. Pille
  • 2,569
  • 1
  • 13
  • 11
  • Thanks @gerard, that's exactly what I was missing. I added another tcpdump showing the results of hitting the same data center from my local machine. The [SYN,ACK] response has a TTL that is about 200 hops fewer than any other response from the server. Ever seen this before? Maybe some kind of standard or best practice? Or is this abnormal behavior? – cmikeb1 May 22 '20 at 14:57
  • My first guess: some kind of protection in your configuration. One can change the TTL with iptables' "mangle". So, your host may have been protected as to prevent any packet from going further. Do you know how to check iptables? – Gerard H. Pille May 22 '20 at 16:15
  • Did you install the container via the ECS console? – Gerard H. Pille May 22 '20 at 16:33
  • Since I see the same behavior if I ping that server locally (see edit #2), from our prod environment, and from our non-prod environment then I'm certain this is something on the side of the server answering the request. It sounds like it's not a common best practice, so I'll have to follow up with the team running the server at to track down the cause. – cmikeb1 May 22 '20 at 17:24
  • 1
    I wouldn't mind hearing how this turns out. – Gerard H. Pille May 22 '20 at 20:53
  • Turns out that the [SYN,ACK] response is using a TTL that is equal to the TTL of the [SYN] request upon arrival. So the request starts with a TTL of 64, it takes 33 hops to get to the host which responds with a TTL of 64-33=31. Not sure why this is the case, I think likely some kind of security feature of their firewall. – cmikeb1 May 28 '20 at 16:34
  • If that is true, you could simply increase your TTL a little, et voilà. I thought you tried that. – Gerard H. Pille May 28 '20 at 16:46
  • What you describe is certainly not normal behaviour. Duckduckgo.com is 26 hops away from my system. I send an ack with TTL 64. I get a SYN,ACK with a TTL of 37. This can only be if they send the SYN,ACK with a TTL of 64, not 64 - 26. – Gerard H. Pille May 28 '20 at 17:13
  • Can you run your tests on a physical system, not in the cloud? – Gerard H. Pille May 28 '20 at 17:15