0

I have two workers and a parameter-server, i.e, total 3 ec2 instances. I need to communicate among 3 instances (send and receive packets simultaneously).

All of the three instances have same Security Group configuration: Inbound

Outbound

Parameter-server instance runs the following code:

import socket

TCP_IP = '0.0.0.0'
port = 8080
s = 0
MAX_WORKERS = 2

###other codes

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print("Connecting to port : ", port)
s.bind((TCP_IP, port))
s.listen(1)
conn, addr = s.accept()
print('Connection address:', addr)
conn, addr = s.accept()
print('Connection address:', addr)
k=0
while 1:
   size = safe_recv(8,conn)
   size = pickle.loads(size)
   data = safe_recv(size,conn)  

   ###other codes

   conn.sendall(size)
   conn.sendall(global_var_vals.value)
    ###Other codes
conn.close()
s.close()

Worker instances run the following code:

import socket

TCP_IP = '<parameter-server ip>'
port = 8081 #for worker-1, 8082 for worker-2
port_main = 8080


###other codes

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((TCP_IP, port_main))
#receiving the variable values
recv_size = safe_recv(8, s)
recv_size = pickle.loads(recv_size)
recv_data = safe_recv(recv_size, s)
var_vals = pickle.loads(recv_data)
s.close()

###Other codes

# Opening the socket and connecting to server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((TCP_IP, port))
while not mon_sess.should_stop():
   ###other codes

   s.sendall(send_size)
   s.sendall(send_data)
   #receiving the variable values
   recv_size = safe_recv(8, s)
   recv_size = pickle.loads(recv_size)
   recv_data = safe_recv(recv_size, s)
   var_vals = pickle.loads(recv_data)
   ###Other codes
s.close()

But when i run this code, it shows Connection timed out

I also tried to connect through telnet, but except port 22, other ports show following error:

> telnet <parameter-server ip> 8080
Trying <parameter-server ip>...
telnet: Unable to connect to remote host: Connection timed out

But for port 22, it shows:

> telnet <parameter-server ip> 22
Trying <parameter-server ip>...
Connected to <parameter-server ip>
Escape character is '^]'.
SSH-2.0-OpenSSH_7.6p1 Ubuntu-4ubuntu0.3

How can I establish connection among the instances? Thanks in advance.

Leolime
  • 197
  • 1
  • 1
  • 11
  • Make sure that you can ping each instance from the others. Then remove your application from the picture and use a known application, such as netcat, to test connectivity on those ports. PS you should use the default VPC NACLs until you resolve this problem, and even after resolved you should probably not customize NACLs. – jarmod Oct 20 '19 at 21:11
  • Did you ever find the problem? – Ashaman Kingpin Oct 22 '19 at 23:08

1 Answers1

0

Are the instances within the same VPC subnet? If not, check the NACL rules to make sure you are letting the ports you need through the subnet NACLs too.

Ashaman Kingpin
  • 1,467
  • 1
  • 11
  • 11
  • I'm assuming by your question that you didn't create one explicitly and therefore you are using a default subnet which should be associated with a subnet that let's all traffic through. But please confirm this via the AWS management console. – Ashaman Kingpin Oct 20 '19 at 12:36
  • Also check the Ubuntu firewall and ensure it is not blocking all ports except port 22. What is the output of running this command: "sudo iptables -L"? – Ashaman Kingpin Oct 20 '19 at 12:37
  • Chain INPUT (policy ACCEPT)\\ target prot opt source destination \\ Chain FORWARD (policy ACCEPT)\\ target prot opt source destination \\ Chain OUTPUT (policy ACCEPT)\\ target prot opt source destination\\ this is the output @Ashman, "\\" are for new lines – Leolime Oct 20 '19 at 12:47
  • And you verified from the parameter server itself that you can connect to port 8080 once you start your script i.e. telnet 127.0.0.1 8080 from the parameter server itself? – Ashaman Kingpin Oct 20 '19 at 12:57
  • no, telnet 127.0.0.1 8080 also returns "telnet: Unable to connect to remote host: Connection refused" – Leolime Oct 20 '19 at 13:04
  • Ah well, there is your problem, even after starting the script you are not able to connect from localhost itself. You did start the script, though right? Because I tried your server script and it is successfully listening on the 8080 port. – Ashaman Kingpin Oct 20 '19 at 13:10
  • what should i do now? do i need to create new instance with vpc subnet? – Leolime Oct 20 '19 at 13:42
  • No, that won't help at all since even within one instance, if you start your server script, you are not able to connect to port 8080. You need to figure out why that is. Again, I tried your server script locally and was able to confirm the port was open. – Ashaman Kingpin Oct 20 '19 at 13:44
  • it works with localhost now, but it doesn't work for sending and receiving packets for other instances – Leolime Oct 20 '19 at 14:16
  • Hmm, since your script only allows 2 connections before basically looping indefinitely (and you should set SO_REUSEADDR flag, i.e. s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR,1) after the socket.socket call) it's hard to pinpoint the issue. Add the setsockopt call I just suggested and test it from one of your workers. – Ashaman Kingpin Oct 20 '19 at 14:25