9

I'm working on a program that does heavy read/write random access on huge file (till 64 GB). Files are specifically structured and to make access on them I've created a framework; after a while I tried to test performance on it and I've noticed that on preallocated file sequential write operations are too slow to be acceptable. After many tests I replicated the behavior without my framework (only FileStream methods); here's the portion of code that (with my hardware) replicates the issue:

FileStream fs = new FileStream("test1.vhd", FileMode.Open);
byte[] buffer = new byte[256 * 1024];
Random rand = new Random();
rand.NextBytes(buffer);
DateTime start, end;
double ellapsed = 0.0;
long startPos, endPos;

BinaryReader br = new BinaryReader(fs);
br.ReadUInt32();
br.ReadUInt32();
for (int i = 0; i < 65536; i++)
    br.ReadUInt16();

br = null;

startPos = 0;   // 0
endPos = 4294967296;    // 4GB
for (long index = startPos; index < endPos; index += buffer.Length)
{
    start = DateTime.Now;
    fs.Write(buffer, 0, buffer.Length);
    end = DateTime.Now;
    ellapsed += (end - start).TotalMilliseconds;
}

Unfortunately the issue seems to be unpredictable, so sometimes it "works", sometimes it doesn't. However, using Process Monitor I've caught the following events:

Operation   Result  Detail
WriteFile   SUCCESS Offset: 1.905.655.816, Length: 262.144
WriteFile   SUCCESS Offset: 1.905.917.960, Length: 262.144
WriteFile   SUCCESS Offset: 1.906.180.104, Length: 262.144
WriteFile   SUCCESS Offset: 1.906.442.248, Length: 262.144
WriteFile   SUCCESS Offset: 1.906.704.392, Length: 262.144
WriteFile   SUCCESS Offset: 1.906.966.536, Length: 262.144
ReadFile    SUCCESS Offset: 1.907.228.672, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
WriteFile   SUCCESS Offset: 1.907.228.680, Length: 262.144
ReadFile    SUCCESS Offset: 1.907.355.648, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
ReadFile    SUCCESS Offset: 1.907.490.816, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
WriteFile   SUCCESS Offset: 1.907.490.824, Length: 262.144
ReadFile    SUCCESS Offset: 1.907.617.792, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
ReadFile    SUCCESS Offset: 1.907.752.960, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
WriteFile   SUCCESS Offset: 1.907.752.968, Length: 262.144

That is, after over-writing almost 2 GB, FileStream.Write starts to call ReadFile after every WriteFile, and this issue continue till the end of the process; also, the offset at which the issue begins seems to be random. I've debugged step-by-step inside the FileStream.Write method and I've verified that actually is the WriteFile (Win32 API) that, internally, calls ReadFile.

Last note; I don't think it is a file fragmentation issue: I've defragmented the file personally with contig!

tshepang
  • 12,111
  • 21
  • 91
  • 136
Atropo
  • 125
  • 8
  • 1
    Consider switching memory [mapped files](http://msdn.microsoft.com/en-us/library/dd997372.aspx). – gor Feb 02 '11 at 13:51
  • Do you mean that I should create references from Win32 API or use .NET4? In the first case, it will be better to create the entire framework in C/C++ (and I'm really considering this possibility!); in the latter I should also upgrade to VS2010 or use SharpDevelop: I prefer use what I have! – Atropo Feb 02 '11 at 14:54
  • It might be an OS buffering issue, I can't replicate the reads on Win7 x64 and .Net 4.0. (Also, please use `using` blocks, I don't want to cry today) – user7116 Feb 02 '11 at 15:13
  • I've added FileOptions.WriteThrough to constructor: no changes! – Atropo Feb 02 '11 at 15:37
  • You ran out of RAM for the file system cache. Write speed falls off a cliff since you can now only write as fast as the disk can be written. Get more RAM. – Hans Passant Feb 02 '11 at 19:15
  • @Hans You seem pretty sure about that, but I don't believe that it's an out of memory problem; first, I've 4 GB of RAM, and Process Explorer tells me that only 2.1 GB of the whole physical memory is used during tests; second if it's a memory problem, why does the system use ReadFile when I want only to write? – Atropo Feb 03 '11 at 08:23
  • Right, the other 2.1 GB is used by the file system cache. Which you filled up. The ReadFile calls are for the paging file. – Hans Passant Feb 03 '11 at 08:37
  • @Hans Sorry, but I have the counter proof: commenting the portion of code above that makes readings before the for-cicle, the issue disappear!!! – Atropo Feb 03 '11 at 09:21
  • Another one: using Cacheset (from sysinternals) I can confirm that the peak size of the file system cache is 102 MB... no out of memory! – Atropo Feb 03 '11 at 09:33

2 Answers2

1

I believe this has to do with FileStream.Write / Read and a 2GB limit. Are you running this in a 32 bit process? I could not find any specific documentation on this, but here is a MSDN forum question that sounds the same. You could try running this in a 64bit process.

I agree however that using a memory mapped file may be a better approach.

Mike Ohlsen
  • 1,900
  • 12
  • 21
  • I'm in a Win7 64-bit system! However I don't think it's a problem of FileStream.Write: I've debugged it (after a decompilation of mscorlib)! – Atropo Feb 02 '11 at 14:02
  • That's very possible. It's my understanding that .NET is still limited to 32 bit processes or the 2 GB memory limit. But you're not allocating more than 2GB so I doubt that's the problem. – JP Richardson Feb 02 '11 at 14:03
  • does the .net app target "Any CPU" or x86? – Mike Ohlsen Feb 02 '11 at 15:01
  • .Net's FileStream.Write/Read can handle files greater than 2GB. – user7116 Feb 02 '11 at 15:10
  • @sixlettervariables: yes, you right: it can! If not, how can i create a 4 GB file??? – Atropo Feb 02 '11 at 15:12
  • @Mike Ohlsen, I want to +1 this for the Memory Mapped File suggestion, but there is the 2GB/32bit problem in your answer. – user7116 Feb 02 '11 at 15:19
1

I found this from MSDN. Could it be related? Sounds to me each file has one globally shared pointer.

When a FileStream object does not have an exclusive hold on its handle, another thread could access the file handle concurrently and change the position of the operating system's file pointer that is associated with the file handle. In this case, the cached position in the FileStream object and the cached data in the buffer could be compromised. The FileStream object routinely performs checks on methods that access the cached buffer to assure that the operating system's handle position is the same as the cached position used by the FileStream object.

http://msdn.microsoft.com/en-us/library/system.io.filestream.aspx

Bengie
  • 1,035
  • 5
  • 10
  • According to the documentation, seems to be sufficient to use `FileOptions.WriteThrough` to disable every cache between `FileStream.Write` and the disk; but I still observe the presence of `ReadFile` during tests. – Atropo Feb 03 '11 at 08:32