1

I have two servers, let's call them Milo and Otis. Now Milo and Otis are setup to be an active-passive pair of highly available servers where Milo is usually the master and Otis stands-by waiting for the unlikely failure of Milo when Otis will take over the shared virtual IP. I have a question about what happens to the SSL connection during the failure.

Consider the following:

  1. Some client (me) makes a SSL connection to Milo.
  2. The SSL connection is set to stay alive, so lets say a webpage is requested over the SSL connection. The page is downloaded completely and the connection is open ready for another request (let's say an asset like a css file).
  3. Before the request for the css file is started, Milo experiences some catastrophic failure and now Otis has taken over.
  4. What happens now that I want to make a request for the css file? I still think I have an open connection to Milo, but the Virtual IP is now pointing to Otis.

Does Otis pick up the SSL sessions that Milo had automatically? Does my browser start to communicate with Otis and Otis says "Hey, we should probably shake hands first."? Any and all comments/answers about this would be appreciated.

1 Answers1

1

When failover occurs, currently active connections will break. This will happen the TCP level however, so SSL/TLS won't even enter the scene. A TCP connection requires that both endpoints know about the connection and have the right information (sequence numbers, window sizes, etc.), and so when the IP fails over the backup machine will have no knowledge of the TCP connections which the master had established. When the backup machine receives packets which are part of previous TCP connections it will reply with an RST packet, which will cause the client to close the connection.

Even if TCP connections were recovered, a similar situation would occur with SSL/TLS. Each session requires state at each end, including the session key (which is what actually secures the application data). The backup machine won't have this session state, and so existing sessions would be terminated.

mgorven
  • 30,615
  • 7
  • 79
  • 122
  • So, even though the IP doesn't change as far as the client is concerned, the new (Otis) server will go "Whoa buddy, we don't have a TCP connection yet"? Is that right? – Patrick James McDougle Mar 18 '13 at 21:11
  • @PatrickJamesMcDougle Correct. – mgorven Mar 18 '13 at 21:11
  • `currently active connections will break` - no, it's what *should* happen (sometimes it doesn't - fencing is a major issue with any clustering). `This will happen the TCP level` - not necessarily - a lot depends on where the SSL is terminated - which is not necessarily on the server. `The backup machine won't have this session state` - sorry, this is very wrong. SSL session sharing across servers has been around for a very long time. – symcbean Mar 18 '13 at 23:43
  • 2
    @symcbean I'd be interested in more details about those points, hope you're writing up an answer :-) – mgorven Mar 19 '13 at 00:00
  • 1
    http://linux-ha.org/wiki/STONITH , http://linux.die.net/man/8/distcache , http://journal.paul.querna.org/articles/2010/07/10/overclocking-mod_ssl/ and it's trivial to handle failed GETs (POSTs should never be recoverable) at the application level. – symcbean Mar 19 '13 at 09:48
  • @symcbean thanks for your insights. Just to clarify the SSL is terminated on Milo and Otis, not on any other machine. – Patrick James McDougle Mar 19 '13 at 12:40
  • @symcbean It looks like that SSL session sharing allows sessions to be *resumed* on a new TCP connection -- even if the TCP connection didn't drop, I doubt that SSL could resume that specific connection. – mgorven Mar 19 '13 at 16:28
  • Apparently you can transparently migrate TCP sessions to other servers, some folks at MIT have [done this](http://nms.lcs.mit.edu/projects/migrate/). – Patrick James McDougle Mar 19 '13 at 17:49
  • @mgovern: yes - while as Patrick says, its technically *possible* in practice it's rather difficult to failover a TCP connection - only the SSL session can be shared. This is a bit of a problem for SPDY (which relies on long keep-alives - the client only knows the connection has failed when it times out, whereas for new TCP connections, all browser can detect non-availabiltiy much faster) – symcbean Mar 20 '13 at 12:34