I'm encountering an interesting problem with setting up a High-Availability file system cluster on EC2. The idea behind the setup is simple: 2 GlusterFS nodes are in two separate availability zones synchronizing data between themselves. I can mount either of these two servers on any other EC2 instances without any problems.
However, in the interests of spreading things out and also migrating off of bad nodes, I want to put this behind a Load Balancer. The problem seemed simple enough, I opened ports on the load balancer and then set the host to the load balancer instead of the individual glusterFS node, however, it insists that it can't make the connection. I thought this might be a firewall issue and to rule that out, I actually opened ports 1024-65535. A terrible idea for sure, but I needed to rule that out.
Here's what the logs say:
[2013-04-24 21:51:03.581564] I [glusterfsd.c:1666:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.3.1
[2013-04-24 21:51:03.608884] W [socket.c:1512:__socket_proto_state_machine] 0-glusterfs: reading from socket failed. Error (Transport endpoint is not connected), peer (1.2.3.4:24007)
The strange part is, I can connect to that IP fine via telnet on the same port.
Has anyone done this before, or have any insights as to a way I can work around this?
Thanks!