I am looking at making a write optimization for CIFS/SMB such that the writing of duplicate blocks are suppressed. For example, I read a file from the remote share and modify a portion near the end of the file. When I save the file, I only want to send write requests back to the remote side for the portions of the file that have actually changed. So basically, suppress all writes up until the point at which a non duplicate write is encountered. At that point the suppression will be disabled and the writes will be allowed as usual. The problem is I can't find any documentation MS-SMB/MS-SMB2/MS-CIFS or otherwise that indicates whether or not this is a valid thing to do. Does anyone know if this would be valid?
3 Answers
Dig deep into the sources of the Linux kernel, there is documentation on CIFS - both in source and text. E.g. http://www.mjmwired.net/kernel/Documentation/filesystems/cifs.txt
If you want to study the behaviour of e.g. the CIFS protocol, you may be able to test it with the unix command "dd". Mount any remote file-system via CIFS, e.g. into /media/remote
. Change into this folder
cd /media/remoteNow create a file with some random stuff (e.g. from the kernel's random pool):
dd if=/dev/urandom of=test.bin bs=4M count=5In this example, you should see some 20MB of traffic. Then create another smaller file, somewhere on your machine, say, your home-folder:
dd if=/dev/urandom of=~/test_chunk.bin bs=4M count=1The interesting thing is what happens, if you attempt to write the chunk into a specific position of the remote test file:
dd if=~/test_chunk.bin of=test.bin bs=4M count=1 seek=3 conv=notruncActually, this should only change block #4 out of 5 in the target file. I guess you can adjust the block size ... I did this with 4 MB blocks. But it should help to understand what happens on the network.

- 3,433
- 2
- 34
- 71
-
Thanks for pointing me in that direction but it wasn't overly helpful. This is an optimization that I don't see anyone else using as I have examined many packet captures. I would expect the protocol to have something to say about it but perhaps this would be more of an application specific thing. – Chappelle Dec 06 '12 at 21:17
-
The protocol needs to support it, that's why looking into e.g. sources makes sense. But yes, the actual behaviour should be defined by the application. It should be interesting, how the "dd" command behaves, when used on a file system, which is mounted as CIFS. You can address specific blocks of a file with dd. Doing a packet capture on this one must be really interesting :-) – s-m-e Dec 06 '12 at 21:24
-
I updated the answer with some sort of a test case based on "dd". – s-m-e Dec 06 '12 at 21:52
The CIFS protocol does allow applications to write back specific portions of the file. This is controlled by the parameters DataOffset and DataLength in the SMB WriteAndX packet.
Documentation for the same can be found here: http://msdn.microsoft.com/en-us/library/ee441954.aspx
The client can use these fields to write a specific length of data to specific offsets within the file.
Similar support exists in more recent versions of the protocol as well ...

- 46
- 4
SMB protocol have such write optimization. It works with append cifs operation. Where protocol read EOF for file and start writing new data with offset set to EOF value and length as append data bytes.

- 1
- 3