-1

I think it relates just to the TCP layer, but I describe my setup in the following paragraph:

On google compute engine I set up a http and websocket server (python, geventwebsocket+gevent.WSGIServer). At home I have my computer (esp8266) that connects to it using websockets.

I use websockets because I need bidirectional communication (a couple of messages a day, it goes like this: a message from server, a response from client.) The connection itself is initiated by the client, as it's behind a NAT.

The problem is that a couple of seconds from the last packet exchange, the messages from server don't arrive to the client. However, the client can send packets to the server even minutes after (and possibly much longer). And interestingly then, the probably retransmitted packets from server finally arrive.

I examined the packets are indeed sent from server with wireshark (and retrasmitted, if not ack'ed) and log every network communication on the client, so the problem probably isn't the application software. I get no exceptions in the applications. The connections are open.

I tested the time server can sent packets after the connection initiation/last delivered packet generally and it's between 6 and 20 seconds, varying between tests. In the test server sends out packets with a set, fixed, delay between them.

In a test (couple of packets) with the single set delay usually either all packets arrive, or none (yeah if one doesn't arrive, the next won't).

I suspect that might be because of the NAT. But then the one solution I see would be to periodically (every 6 seconds or less) send out keep alive packets (Pings and Pongs in websocket, or the TCP's keepalive) from the client. But that doesn't seem elegant, as there should be only a few data messages in a day.

And the similar thing happens when ssh'ing from my desktop to the server: after a couple seconds of inactivity at my and server side, the server stops sending anything (tested e.g. with watch -n20 date. Sometimes it just freezes and doesn't update until I press a key = send a packet from client. But the update is not instant in case of the ssh, it takes a couple of seconds after the keypress to see new stuff. Edit: of course that must be due to the retransmission timer algorithm)


So I studied what is the purpose of TCP keep-alive packets etc. and the thing is that routers and NAT's forget the connections or mappings or whatever in some time/keep only the newest. (So I guess in the case of client->server the mappings just recreate as the destination ip is public and is the actual server. And in the opposite direction it is not possible, so it doesn't work.)

But didn't think it can be as bad as in 6 seconds. The websockets almost reduce to polling (although with a possibly smaller lag).

Adam
  • 1,724
  • 4
  • 21
  • 31
  • 1
    This really sounds like the NAT router has an unreasonably short timeout on its NAT table entries. What's the device doing NAT? Does it have any relevant config options? Is there any way to do a packet capture on the WAN side of the NAT router? – Gordon Davisson Jun 11 '20 at 21:56
  • @Gordon Davisson I only have access to the closest NAT router (and there are more -- the WAN address doesn't match my public IP address). I might be able to do packet capture there. Are you sure it cannot be the non-NAT IP routers? They don't keep per connection tables? It's also a route from US to EU. – Adam Jun 11 '20 at 22:31
  • 2
    The router(s) doing NAT are the most likely suspects for a problem like this, but not the only possible ones. It's normal for different packets in the same TCP session to take different routes, so a normal router is expected to forward packets that're part of a session it hadn't seen before. NAT routers necessarily track TCP sessions so they can properly translate incoming packets (and the symptom you're seeing matches what I'd expect from a NAT router not knowing how to translate an incoming packet). Some firewalls do similar things, so that's another possible suspect. – Gordon Davisson Jun 11 '20 at 22:46
  • Thx for the explanation! I’ll take a look at the router config and try to capture some packets. But it’s only one NAT of many – Adam Jun 11 '20 at 22:52
  • 1
    It sounds like your ISP is using CGN, so you will not have an actual public address on your router. It will either be an address in the Shared space (what the ISP is supposed to do) or an address in one of the Private spaces (what many ISPs do, even though it is not recommended). In either case, you have no control over the ISP NAPT router, and it probably sets a very short timeout period on purpose because you are sharing the NAPT tables with many other ISP customers. – Ron Maupin Jun 11 '20 at 23:38
  • Does your ISP provide IPv6 connectivity? That doesn't use NAT (unless your ISP is *really* bad), so it probably won't have the same problem. Alternately.. what other ISPs are available in your area, and are they as deep in NATs as your current provider? – Gordon Davisson Jun 15 '20 at 03:25

1 Answers1

0

It seems that the router's NAT mechanism may cause the problem. Maybe you can usee some little tools like NAT-PMP or Upnp to open a port and mapping to your local client. This will last long enough for you to do bidirectional communication.

tyChen
  • 1,404
  • 8
  • 27
  • So I looked at the IP addresses (of my router and the perceived external, public) and judging by them I’m behind at least three NAT devices. My router (I have control of it), and another 2 NATs, for the close one I might know the IP address, but for the other(s) I don’t. I might try do a trace route, but otherwise I have no idea how to discover the addresses. – Adam Jun 13 '20 at 10:01
  • If I knew all the addresses, I might try to reserve port mappings with the protocols you suggest: IGD, NAT-PMP or the latest PCP. I don’t know if the two first would work behind many NATs. PCP might, judging by the last paragraph of https://tools.ietf.org/html/rfc6887#section-8.1 – Adam Jun 13 '20 at 10:09
  • Also I don’t know how widely those protocols are supported on ISP’s hardware generally (and if the services are enabled on the particular routers.) Anyway the protocols are designed to solve my problem. – Adam Jun 13 '20 at 10:16
  • These protocols will return the external IP to your client no matter how much NAT routers you connect, it can work in your circumstance if your router support them. If not, maybe you must use a central server to collect the information and then act as a relay. As this point, ICE may help you. – tyChen Jun 13 '20 at 11:25