6

This is a puzzler, and I'm hoping that by writing a StackOverflow question, I gain some fresh insights.

In a nutshell, I'm trying to figure out why I can access https://sts.nih.gov from a host machine, but not from a docker container on the same host when other sites work just fine

How I reproduce the problem...

I have a cloud-based machine (Digital Ocean) which can happily establish a https connection to sts.nih.gov

# from host machine
curl -vv -o /tmp/test https://sts.nih.gov

If I get a shell on a fresh docker container, I cannot access that site

 # get a shell within a container 
 docker run -ti ubuntu:18.04 /bin/bash

 # attempt same request...
 curl -vv --ipv4 -o /tmp/test https://sts.nih.gov
* Rebuilt URL to: https://sts.nih.gov/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 128.231.243.251...
* TCP_NODELAY set
  0     0    0     0    0     0      0      0 --:--:--  0:00:31 --:--:--     0* connect to 128.231.243.251 port 443 failed: Connection timed out
* Failed to connect to sts.nih.gov port 443: Connection timed out
* Closing connection 0
curl: (7) Failed to connect to sts.nih.gov port 443: Connection timed out

Now one interesting thing is that without the --ipv4 flag, the command was attempting to use ipv6 and failing.

Does this happen for all accesses to external hosts?

Nope, within a docker container, curl -o /tmp/test https://serverfault.com/ works just fine, for example.

Is it a DNS problem?

No, nslookup is able to resolve the address within the container

nslookup sts.nih.gov
Server:     67.207.67.3
Address:    67.207.67.3#53

Non-authoritative answer:
sts.nih.gov canonical name = sts.ha.nih.gov.
Name:   sts.ha.nih.gov
Address: 128.231.243.251
Name:   sts.ha.nih.gov
Address: 2607:f220:404:9124:128:231:243:251

I can attempt to use an IP address in the request too

curl -vv -o /tmp/test https://128.231.243.251

Same result - a timeout.

Is it specific to https?

No, this seems to be a TCP/IP issue rather than an https protocol issue. Just using netcat to check the connectivity fails.

netcat -zvn 128.231.243.251 443
(UNKNOWN) [128.231.243.251] 443 (?) : Connection timed out

Is it a routing issue?

It doesn't seem to be - after all, the host can access the problem site, and the docker container can access other external sites.

Traceroute shows ICMP packets at least are reaching the target network

traceroute 128.231.243.251
traceroute to 128.231.243.251 (128.231.243.251), 30 hops max, 60 byte packets
 1  172.17.0.1 (172.17.0.1)  0.063 ms  0.029 ms  0.023 ms
 2  * * *
 3  10.80.5.46 (10.80.5.46)  1.758 ms 10.80.5.48 (10.80.5.48)  1.864 ms 10.80.5.38 (10.80.5.38)  4.499 ms
 4  138.197.249.112 (138.197.249.112)  1.991 ms 138.197.249.122 (138.197.249.122)  2.179 ms 138.197.249.104 (138.197.249.104)  1.961 ms
 5  138.197.251.136 (138.197.251.136)  1.659 ms 138.197.251.142 (138.197.251.142)  1.846 ms 138.197.251.138 (138.197.251.138)  1.799 ms
 6  212.187.195.149 (212.187.195.149)  4.005 ms 212.187.195.85 (212.187.195.85)  1.800 ms  1.743 ms
 7  * * *
 8  4.16.68.166 (4.16.68.166)  76.945 ms  76.901 ms  76.869 ms
 9  bth-tic-core-rt-a-te-0-0-0-0.net.nih.gov (156.40.93.1)  77.783 ms  77.754 ms  77.632 ms
10  156.40.93.170 (156.40.93.170)  76.519 ms  76.473 ms  76.429 ms
11  156.40.93.171 (156.40.93.171)  77.745 ms  76.627 ms  77.020 ms
12  * * *
...
30  * * *

I can also show a good trace using TCP SYN packages

traceroute --tcp 128.231.243.251
traceroute to 128.231.243.251 (128.231.243.251), 30 hops max, 60 byte packets
 1  172.17.0.1 (172.17.0.1)  0.066 ms  0.017 ms  0.017 ms
 2  * * *
 3  10.80.5.34 (10.80.5.34)  1.881 ms 10.80.5.46 (10.80.5.46)  2.113 ms 10.80.5.36 (10.80.5.36)  1.832 ms
 4  138.197.249.98 (138.197.249.98)  3.127 ms 138.197.249.120 (138.197.249.120)  1.978 ms 138.197.249.106 (138.197.249.106)  1.853 ms
 5  138.197.251.140 (138.197.251.140)  1.784 ms  1.826 ms 138.197.251.132 (138.197.251.132)  1.705 ms
 6  212.187.195.149 (212.187.195.149)  2.859 ms  1.457 ms  1.389 ms
 7  * * *
 8  4.16.68.166 (4.16.68.166)  76.470 ms  76.446 ms  76.520 ms
 9  bth-tic-core-rt-a-te-0-0-0-0.net.nih.gov (156.40.93.1)  77.602 ms  77.582 ms  77.492 ms
10  156.40.93.170 (156.40.93.170)  76.005 ms  76.733 ms  76.459 ms
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  128.231.243.251 (128.231.243.251)  77.268 ms  77.215 ms  76.815 ms

next moves?

At this point, I'm baffled as to how to narrow this down further. To me, it feels like there's something about the networking at the remote end which is unusual but only manifests itself within docker's networking mechanisms.

Paul Dixon
  • 1,516
  • 3
  • 23
  • 37
  • 1
    What do you see if you run `tcpdump -n -i docker0` on the host while running the `curl` command inside the container? – larsks Aug 13 '22 at 03:55
  • 1
    Like the other comment indicated, use a tool (either tcpdump or another) to capture the actual SSL/TLS handshakes and see for the host/guest pair what exactly happens when TCP connections to port 443 of `sts.nih.gov` are made. The differences from the packets should contain enough hints on what can be wrong. – Lex Li Aug 13 '22 at 05:28
  • Thanks for the suggestions - unfortunately, when I continued the investigation, the problem had gone away. It was almost certainly something at the remote end, if I find out what it was I'll post an anwser. – Paul Dixon Aug 14 '22 at 09:59

1 Answers1

1

You need to create a new bridge docker network and attach the container to this network. You should be able to connect by this way. If you cant is because some docker services are broken, just restart docker. I had this problem too.

  • While I can no longer reproduce the issue, I don't think this would have helped as the network was capable of reaching *other* external hosts. – Paul Dixon Aug 24 '22 at 16:51
  • I had the same issue, and this solved my problem. Is a know bug. Whatever! :) – Chris Kosch Sep 01 '22 at 14:58