1

We have clients that, on startup, map an smb share on a FreeNAS server. We've noticed that after a power loss, they sometimes have an issue connecting to the smb share.

After a bit of debugging and packet capture, it appears the problem is that the client is choosing a previously-used ephemeral port that, due to the power failure, was not properly closed. When the SYN is sent to the start the SMB connection, the server responds with an ACK with a very high Ack number, instead of a SYN-ACK as expected.

It appears the server sees the SYN from the previously used, never-closed ephemeral port, and tries to continue the old conversation. This back and forth will continue for a while (over a minute), and the errors/delays are problematic.

Who bears the responsibility for dealing with this situation? The client or the server? Any specs or details would be very helpful. Who, if anyone, is misbehaving in this situation?

Update: The behavior that @fendall describes is what Windows proper does. However, the currently released version of WinPE, v10.0.18362.1, does not behave appropriately. Instead, it will try new ephemeral ports, with each subsequent port getting more attempts, until it finally gives up. The winPE version that ships with the Win10 1903 installer works fine. On the released WinPE, if you've recently had connections on the fairly deterministic handful of ephemeral ports it tries, you're just out of luck - no tcp connections can be made until the server decides to forget about them.

js2010
  • 23,033
  • 6
  • 64
  • 66
aggieNick02
  • 2,557
  • 2
  • 23
  • 36
  • I'm not sure if this is related. But I find multicasting to be about 8x as slow with any version of winpe above windows 10 1607. – js2010 Sep 13 '19 at 18:45

1 Answers1

2

The answer to your question, from my interpretation, is that the client is responsible.

In response to the server's ACK, the client should detect that "this segment does not acknowledge anything it sent and, being unsynchronized, sends a reset (RST) because it has detected a half-open connection." Reference: RFC 793 Pg 34

Community
  • 1
  • 1
fendall
  • 524
  • 2
  • 8
  • Thanks for the explanation and link; that would make sense. The only thing I can't reconcile is why the server responds with such a huge ACK value; perhaps it is just whatever is needed to be outside the valid window. Regardless, the client does not send the reset the spec seems to say it should. – aggieNick02 Sep 03 '19 at 21:43
  • 1
    @aggieNick02 The ACK number is either the number the previous conversation left off at or otherwise it would suggest ACK number stored by the server was corrupted as a result of the power failure. I imagine it would be the former. – fendall Sep 03 '19 at 21:54
  • @fendall The server re-ACKs the current sequence number (from its perspective). It's up to the client to reestablish the seemingly nonsensical connection. – Zac67 Sep 03 '19 at 22:06
  • @fendall - just FYI, the server actually doesn't lose power. And from taking a long running trace, the ACK value is not where the conversation left off either, which is bizarre. Instead it is generally a large value - I'm guessing it somehow helps indicate the invalid nature of the SYN in addition to the response which is ACK without SYN – aggieNick02 Sep 04 '19 at 18:47
  • More interesting tidbits. Even on a proper reboot, windows does not terminate the tcp connections (at least for SMB). However, Windows generally responds appropriately when it gets the server's SYN-free ACK, and issues a RST. However, Windows PE does not do this. Instead, it tries new ephemeral ports, with each subsequent port getting more attempts, until it eventually gives up... I'll update the question with this info. – aggieNick02 Sep 04 '19 at 18:51
  • @aggieNick02 Thanks for the clarification that the large ACK value is not where the conversation left off. I'm curious about this behavior as well. – fendall Sep 04 '19 at 18:52
  • @Zac67 I think we more or less agree and are saying the same thing, but I really appreciate your rephrasing, I think it describes the behavior in a very simple and easy to understand manner – fendall Sep 04 '19 at 18:55
  • 1
    See update above. Thanks fendall and Zac67 for the answers/help. A bit surprising to find the root cause is a tcp protocol bug in a certain version of windows. I'll file with Microsoft. – aggieNick02 Sep 04 '19 at 21:00