0

I am trying to connect from a web server EC2 instance to an ElasticSearch server ec2 instance. The connection is slow to nonexistent when connecting from the EC2, but very fast when connecting from a normal computer (not within AWS).

If I make the request from my laptop, it's fast:

laptop:~ jordan$ time curl -vvv search.example.org:9200
* About to connect() to search.example.org port 9200 (#0)
*   Trying 1.2.3.4... connected
* Connected to search.example.org (1.2.3.4) port 9200 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8y zlib/1.2.3
> Host: search.example.org:9200
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Content-Length: 294
< 
… snip …
* Connection #0 to host search.example.org left intact
* Closing connection #0

real    0m0.071s
user    0m0.004s
sys 0m0.005s
laptop:~ jordan$ 

From the EC2 instance, the request first tries one instance from the load balancer:

[jordan@ip-5-6-7-8 ~]$ time curl -vvv search.example.org:9200
* Rebuilt URL to: search.example.org:9200/
* Hostname was NOT found in DNS cache
*   Trying 1.2.3.4...

Then it tries the other instance:

* connect to 1.2.3.4 port 9200 failed: Connection timed out
*   Trying 9.10.11.12...

Before giving up entirely:

* connect to 9.10.11.12 port 9200 failed: Connection timed out
* Failed to connect to search.example.org port 9200: Connection timed out
* Closing connection 0
curl: (7) Failed to connect to search.example.org port 9200: Connection timed out

When I look at the stats for my ELB, it shows a number of "Backend Connection Errors".

Note that search.example.org is a domain pointing to an ELB. But, if I request the instance it points to, it still encounters a connection timeout:

[jordan@ip-5-6-7-8 ~]$ time curl -vvv ec2-40-41-42-43.compute-1.amazonaws.com:9200
* Rebuilt URL to: ec2-40-41-42-43.compute-1.amazonaws.com:9200/
* Hostname was NOT found in DNS cache
*   Trying 40.41.42.43...

And is still fast from a non EC2 location:

laptop:~ jordan$ time curl -vvv ec2-40-41-42-43.compute-1.amazonaws.com:9200
* About to connect() to ec2-40-41-42-43.compute-1.amazonaws.com port 9200 (#0)
*   Trying 40.41.42.43... connected
* Connected to ec2-40-41-42-43.compute-1.amazonaws.com (40.41.42.43) port 9200 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8y zlib/1.2.3
> Host: ec2-40-41-42-43.compute-1.amazonaws.com:9200
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Content-Length: 294
< 
… snip …
* Connection #0 to host ec2-54-85-45-128.compute-1.amazonaws.com left intact
* Closing connection #0

real    0m0.864s
user    0m0.006s
sys 0m0.011s
laptop:~ jordan$ 

I have nginx running on the search server, and accessing it from anywhere, including the other EC2 instance, is likewise fast. So it appears to be exclusively if I try to access port 9200. Note that all of the servers mentioned are in a shared security group, which includes inbound access to port 9200.

The server does appear to connect just fine if I use the Private IP address. However, I'd prefer not creating an internal load balancer if I can resolve this issue in some other manner.

Jordan Reiter
  • 1,290
  • 4
  • 20
  • 40
  • 1
    Looks a network connectivity issue (note that access from ec2 web server is not slow; it just doesn't connect). Check for any connection from ec2 (ssh or ping for ex, make sure they are enabled in security group) and also check for iptables filtering (outgoing in web server, incoming in search server); running tcpdump in both servers while trying the connection to 9200 would confirm / discard network connectivity issue. – LinuxDevOps Mar 24 '14 at 21:01
  • They're all in the same security group and for good measure I went ahead and temporarily added the All TCP rule with the security group as Custom IP. None of the servers have anything set for iptables. I'll try a tcpdump and see what comes up. I was able to connect via port 80 to one of the servers, but not any other ports (including ssh). Also able to connect if I use the internal IP address. – Jordan Reiter Mar 24 '14 at 21:30
  • so from another ec2 you cannot ssh into search server but you can from other computers? also can you check if there's a difference in `dig ec2-40-41-42-43.compute-1.amazonaws.com` (search server host name) from ec2 and our laptop? – LinuxDevOps Mar 24 '14 at 22:25
  • No difference when calling dig (apart from nameserver of course, since my laptop doesn't use AWS nameserver!). Could connect to some ports (like 80) but many ports just did not work (22, 9200, etc). Finally worked when I set up an internal ELB, but it's kind of a shame to have to have two ELBs side-by-side. – Jordan Reiter Mar 25 '14 at 14:37

1 Answers1

2

AWS optimize connections going out and not from inside to inside. If you're using the public IPs it's giving you, likely your connection is leaving AWS (or at least getting to the edge router), then routing back into AWS.

If you have 2 instances that are internal, use the private IP. Then the connections go between local switches (that's a bit over simplified because the 2 servers still could be very far away).

If you still see bad performance, check your instance sizes... small and micros have very bad latency. Lastly, you can make a VPC. It's like a little cloud within their cloud. You can use your own virtual router and they'll try and colocate servers for you, so basically they are very physically close (if not in the same rack).

comjf
  • 156
  • 3
  • I'll reword my question. It's not a slow connection but a non-existent connection. No matter how non-optimized it may be, it should still *eventually* connect — say, within 10 seconds. Using internal IPs but that is non-ideal as I was placing the servers behind a load balancer. – Jordan Reiter Mar 24 '14 at 21:09
  • @Jordan Did you find an answer? – PKHunter Oct 11 '15 at 06:08
  • No, I didn't. Resorted to making a separate, internal-only load balancer and using that. – Jordan Reiter Oct 16 '15 at 23:17