5

Our app running on client server A and creates a file on the server 2008 R2 file-server using:

CreateFile(LockFileName,
                  GENERIC_READ or GENERIC_WRITE,
                  FILE_SHARE_READ, nil,
                  CREATE_ALWAYS,
                  FILE_FLAG_WRITE_THROUGH or FILE_FLAG_DELETE_ON_CLOSE,
                  0);

The client is testing a disaster situation and powering off 'server A' and leaving it off. They're reporting that our app running on 'server B' using the same filename and the same code fragment above fails (ie the file continues to exist) for at least 15 minutes until, we believe, they browse to folder containing the file in Windows Explorer at which point the file is deleted automatically.

Is anyone aware of how this is supposed to behave in this situation, where the creating server has gone away, should the handles be released and the file removed automatically? And why does looking at the file cause it to delete?

Interestingly, on another supposedly similar setup the issue does not occur.

Sam Cogan
  • 4,124
  • 7
  • 44
  • 76
  • Not an answer, but we've seen related problems with CIFS shares before with a large cluster setup. Even had strange problems related to executables being updated underneath actively running processes in the cluster. We've found that it *could* be as long as 8 hours for the timeout to be noticed... – user7116 Feb 05 '12 at 22:28

3 Answers3

4

[...] where the creating server has gone away, should the handles be released and the file removed automatically?

Eventually yes, but not immediately. As you are running Windows Server 2008 R2 (and thus SMBv2, note that I assume that both server and client are running on Windows Server 2008 R2) the client will request a durable file handle. According to the SMBv2 specification, section 3.3.6.2 and 3.3.7.1 the server must start the durable open scavenger timer (set to 16 minutes on Windows Server by default). Once the timer expires the server must examine all open handles and close those that have not been reclaimed by a client.

In your scenario of course, an open question is whether the server detects the connection loss to the client at all, as the client (i.e. the whole server, not just the process) according to your description is killed immediately.

Now assume that another client is trying to open the file while the durable timeout is still running/the server still considers the file to be open by the first client. Then it is supposed to send an oplock break notification (section 2.2.23.1) to the client that initially opened the file. As the client is unable to respond (it has been killed) the server will wait for the oplock break acknowledgment timeout to expire (section 3.3.2.1, 35 seconds by default) before it will grant the new client access to the file.

There is one other thing to note: The behavior will be different if the second client accesses the file via a local path rather than via an UNC path. In this case the client won't have to wait for the oplock break ack timeout to occur. Windows will grant him access to the file immediately while it will try to send a close request to the first client.

This is how the system is supposed to behave. As to why you are experiencing the issues described I cannot tell. I wouldn't be surprised if you'd stumbled upon a bug in the Fileserver implementation of Win Server 2008. I would try to troubleshoot the issue using the tools mentioned in the other answers (procmon is really nice) and Wireshark helps a lot too.

afrischke
  • 3,836
  • 17
  • 30
1

There is nothing to say there should no longer be any handles when the creating servers goes down. In order for a handle to be removed, something has to initiate that removal. If a server abruptly goes down, it cannot remove its handles, so those handles remain open. As far as the server still up is concerned, all is good and well, and no file handles should be forcibly closed.

Until you actually try to act upon the file handle. Suddenly, the server notices that the host of the file handle is gone, because it tries to initiate communications with said host. Once it realizes this, the file handle gets forcibly closed.

Thus, to answer your question, this seems like perfectly predictable and expected behavior to me.

The reason file handles get closed immediately in another environment probably has to do with something keeping those servers in constant communication: something is constantly accessing a remote file. That's just a guess, though.

Update

Sysinternals, bought out by Microsoft a few years ago, has a great tool called Process Explorer that allows you to search processes' open file handles. This might be of use to you in determining which program(s) are refreshing the file handle(s).

Sysinternals also has Process Monitor, which allows you to see in real-time as programs act upon file handles. This could be another useful program in troubleshooting the issue.

Edit: Oh, and if you really want to have fun, there's Handle, too.

Zenexer
  • 18,788
  • 9
  • 71
  • 77
  • So whilst I don't disagree with what you are saying, the issue we have is that it is not consistently behaving in the same way. In both systems there are a number of other servers, all writing to the same file store, so pretty much constant communication with the file server, and the same share, so I can't find a consistent reason why they should act differently. If I can find that I can solve the issue. Thanks for your help anyway. – Sam Cogan Feb 09 '12 at 11:40
  • Well, as I explained, it's not uncommon for some program on a computer to be accessing files continuously. Thus, the handles are checked and rechecked quite often. In just one case, nothing is accessing the relevant files. Perhaps all the other servers have antivirus, or something as simple as that. – Zenexer Feb 09 '12 at 12:14
  • @Sam I added two tools that could help you determine exactly what is causing the file handle to "refresh" (and thus close) in the other environments. For quick access to those tools, connect to the `\\live.sysinternal.com\tools` share. – Zenexer Feb 09 '12 at 12:24
  • Very good question, by the way. It _is_ a very curious behavior that seems counter-intuitive. – Zenexer Feb 09 '12 at 15:14
0

This looks so far like a non issue to me. Or one that can not be handled outside of Microsoft's programming AND one that has side effects when hnandled. Basically you ahve to account for small disruptuons of communication between client and server and optimize network traffic, so the server can not permanently exchange packets with the client just to see whether the client is still around.

Computer programming must take that into account as far as possible, but timeouts like that are normal unless the client application handles that properly. THe main question (totally not answered) is whether this is an issue at all - so far it looks like "standard behavior".

Is anyone aware of how this is supposed to behave in this situation, where the creating server has gone away, should the handles be released and the file removed automatically?

How would the server know?

And why does looking at the file cause it to delete?

P9ossibly it is the reading that triggered a refresh that timed out, so - at the end - this tirggered the defined behavior (DELETE_ON_CLOSE).

I would hint that any access to certain elements of the file would trigger this, but the tester did not do that excpt just refreshing the explorer.

TomTom
  • 61,059
  • 10
  • 88
  • 148
  • The behaviour we expect, and that we are seeing on other systems is that this file is cleaned up as soon as server A goes down, this is not happening. The release of the handle should cause a clean up, but it is not. – Sam Cogan Feb 05 '12 at 22:22
  • But server A going down is not seomthing the file server is able to identify RIGHT NOW. it is not like a failing serve tell sh te file server it is not available anymore. – TomTom Feb 06 '12 at 05:07
  • But as the server has gone down, there should be no handles open on that file, therefore it should get cleaned up. Whilst I agree there may be a delay in this happening, from what we are seeing this never happens, unless you go and look at the folder in Explorer. – Sam Cogan Feb 06 '12 at 09:54