3

On its documentation ( https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/service_discovery#logical-dns ) for Logical DNS service discovery, Envoy says:

"only uses the first IP address returned when a new connection needs to be initiated"

How does envoy decide when a new upstream connection needs to be initiated?

It also says:

"Connections are never drained"

What happens to old connections if an upstream host becomes unreachable? Do health-checks apply to all the upstream hosts that currently have established connections or are they only monitoring the host with the current "first IP address"? If the latter, am I right to assume that Envoy will only remove the failed upstream connection (and consequently stop trying to send traffic to those hosts) once it tries to write to it and the peer ACK times out? If so, is it possible to configure the timeout duration?

andresp
  • 1,624
  • 19
  • 31

1 Answers1

3

After looking into the code and doing some tests this is what I've seen:

How does envoy decide when a new upstream connection needs to be initiated?

  • For connection establishment, in the case of the TCP proxy (the filter I was using), there is a 1:1 mapping between downstream and upstream connections, therefore a new upstream connection is established when a new downstream connection is established.

What happens to old connections if an upstream host becomes unreachable?

  • It depends on whether the connection was gracefuly terminated (TCP RST packet sent) or not. If it was, then the connection will be destroyed (along with the downstream connection), if it was not, then nothing happens until the TCP connection times out (I believe due to TCP_USER_TIMEOUT or tcp_retries2 retries - it was taking more than 15 minutes on my local machine).

Do health-checks apply to all the upstream hosts that currently have established connections or are they only monitoring the host with the current "first IP address"?

  • They only apply to the current "first IP address".

If the latter, am I right to assume that Envoy will only remove the failed upstream connection (and consequently stop trying to send traffic to those hosts) once it tries to write to it and the peer ACK times out?

  • Yes. Typically the downstream clients timeouts will kick in first and destroy the connection though.

If so, is it possible to configure the timeout duration?

  • I couldn't find an option to set the socket's TCP_USER_TIMEOUT in envoy. Changing the OS tcp_retries2 might help, but, according to the documentation, the total time is also influenced by the smoothed round trip time of the TCP connection, so a change to tcp_retries2 wouldn't be able to define an absolute timeout value.
andresp
  • 1,624
  • 19
  • 31