2

TL/DR:

I have two machines: A and B. I make a testing program, to test the medium (interface) between them - I check for errors when copying file from A to B and then from B to A, but I must do it the fastest I can. So I have on A source file: SRC, and I copy it to B to new file: MID, and then I copy MID again from B to A to new file, DST, and then I compare SRC with DST. The question here, is how to do it with the highest possible speed (i.e. in parallel)

Elaborate:

How can I simultaneously copy a file while it's being written ? I use CopyFileEx to copy a file from SRC to MID, and I must copy it again from MID to DST, at the same time. the data must explicitly go through the disk, I cannot use memory buffers or caches, and:

  1. the 2nd copy must be performed while the file is being created on MID - I cannot wait for it to finish copying.
  2. I must read the file again, explicitly, from MID - I can't use the buffer I use to copy from SRC to MID
  3. all this must perform the fastest I can

I can handle the synchronization issues easily (I use CopyFileEx's CopyProgressRoutine callback to know how many bytes are completed and fire events accordingly), but the file is locked for reading while it's being copied. I can't use normal C#'s FileStream - it's way way too slow...

Possible solutions I'm currently looking into:

  • Volume Shadow Copy (specifically AlphaVSS)
  • memory-mapped-file - I managed to do it very fast, but I'm afraid that the system actually uses cache, and doesn't really read back from MID
  • some win-API P/Invoke function that I don't know of ??
Community
  • 1
  • 1
Tar
  • 8,529
  • 9
  • 56
  • 127
  • `FileStream` is just a thin wrapper around the WIN API's file functions , so if that's too slow I'm not sure what might be faster... – Matthew Watson Jul 02 '13 at 06:41
  • @MatthewWatson: maybe reading file to memory and writing from memory to disk is inherently slow, and using winapi to do that won't be faster than the managed `FileStream`. I suspect that it's true, and that's why I don't tend to go in that direction. I use winapi's `CopyFileEx` which is fast as copying a file using windows interface (C#'s `File.Copy` has same speed, but native `CopyFileEx` also provides "progress callback" that managed `File.Copy` lacks, I use this callback to parallelize things - which I can't do the "c#-way") – Tar Jul 02 '13 at 06:51
  • Why don't you 1) open the input file with a FileStream and 2) write the two output at the same time using two FileStreams? Like Stream.CopyTo but with multiple output streams with the same buffer instead of only one output stream. Each output could be done on a different thread/task, but not sure it will change anything if the hard disk is the same. – Simon Mourier Jul 02 '13 at 08:43
  • @SimonMourier: because I want SRC to copy to MID, and MID copy to DST, not SRC copy both to MID and DST - the whole point here is that MID copies to DST: SRC -> MID -> DST – Tar Jul 02 '13 at 10:36
  • What's the difference between SRC->MID & SRC-> DST and SRC->MID->DST ? – Simon Mourier Jul 02 '13 at 11:57
  • @SimonMourier: that's the whole point here, otherwise that was simple and I wouldn't have been asking this question... MID is located on some device (say, network or pen-drive), and I want the data to explicitly go through the wire twice, back and forth (network: via the Ethernet, pen-drive: via the USB channel), but I want to do it the fastest way I can – Tar Jul 03 '13 at 11:47
  • @tal - still not sure to understand, but if you need an answer I suggest you update your question with this context information which is crucial. – Simon Mourier Jul 03 '13 at 13:14

2 Answers2

1

To be able to read the file while it's being written, it must have been created with dwShareMode = FILE_SHARE_READ. You may have to ditch CopyFileEx and implement it yourself using CreateFile/ReadFile/WriteFile. For async reading/writing you can use the lpOverlapped parameter of ReadFile/WriteFile functions.

Igor Skochinsky
  • 24,629
  • 2
  • 72
  • 109
1

The basic idea is to open the MID file for reading and writing. The simple single-threaded way to do it is:

private static void FunkyCopy(string srcFname, string midFname, string dstFname)
{
    using (FileStream srcFile = new FileStream(srcFname, FileMode.Open, FileAccess.Read, FileShare.None),
                        midFile = new FileStream(midFname, FileMode.Create, FileAccess.ReadWrite,
                                                FileShare.ReadWrite),
                        dstFile = new FileStream(dstFname, FileMode.Create, FileAccess.Write, FileShare.None))
    {
        long totalBytes = 0;
        var buffer = new byte[65536];
        while (totalBytes < srcFile.Length)
        {
            var srcBytesRead = srcFile.Read(buffer, 0, buffer.Length);
            if (srcBytesRead > 0)
            {
                // write to the mid file
                midFile.Write(buffer, 0, srcBytesRead);
                // now read from mid and write to dst
                midFile.Position = totalBytes;
                var midBytesRead = midFile.Read(buffer, 0, srcBytesRead);
                if (midBytesRead != srcBytesRead)
                {
                    throw new ApplicationException("Error reading Mid file!");
                }
                dstFile.Write(buffer, 0, srcBytesRead);
            }
            totalBytes += srcBytesRead;
        }
    }
}

As you noted, that's going to be pretty slow. You can speed it somewhat by making two threads: one for doing the SRC -> MID copy, and another for doing the MID -> DST copy. It's a little more involved, but not terribly so.

static void FunkyCopy2(string srcFname, string midFname, string dstFname)
{
    var cancel = new CancellationTokenSource();
    const int bufferSize = 65536;

    var finfo = new FileInfo(srcFname);
    Console.WriteLine("File length = {0:N0} bytes", finfo.Length);
    long bytesCopiedToMid = 0;
    AutoResetEvent bytesAvailable = new AutoResetEvent(false);

    // First thread copies from src to mid
    var midThread = new Thread(() =>
        {
            Console.WriteLine("midThread started");
            using (
                FileStream srcFile = new FileStream(srcFname, FileMode.Open, FileAccess.Read, FileShare.None),
                            midFile = new FileStream(midFname, FileMode.Create, FileAccess.Read,
                                                    FileShare.ReadWrite))
            {
                var buffer = new byte[bufferSize];
                while (bytesCopiedToMid < finfo.Length)
                {
                    var srcBytesRead = srcFile.Read(buffer, 0, buffer.Length);
                    if (srcBytesRead > 0)
                    {
                        midFile.Write(buffer, 0, srcBytesRead);
                        Interlocked.Add(ref bytesCopiedToMid, srcBytesRead);
                        bytesAvailable.Set();
                    }
                }
            }
            Console.WriteLine("midThread exit");
        });

    // Second thread copies from mid to dst
    var dstThread = new Thread(() =>
        {
            Console.WriteLine("dstThread started");
            using (
                FileStream midFile = new FileStream(midFname, FileMode.Open, FileAccess.Read,
                                                    FileShare.ReadWrite),
                            dstFile = new FileStream(dstFname, FileMode.Create, FileAccess.Write, FileShare.Write)
                )
            {
                long bytesCopiedToDst = 0;
                var buffer = new byte[bufferSize];
                while (bytesCopiedToDst != finfo.Length)
                {
                    // if we've already copied everything from mid, then wait for more.
                    if (Interlocked.CompareExchange(ref bytesCopiedToMid, bytesCopiedToDst, bytesCopiedToDst) ==
                        bytesCopiedToDst)
                    {
                        bytesAvailable.WaitOne();
                    }
                    var midBytesRead = midFile.Read(buffer, 0, buffer.Length);
                    if (midBytesRead > 0)
                    {
                        dstFile.Write(buffer, 0, midBytesRead);
                        bytesCopiedToDst += midBytesRead;
                        Console.WriteLine("{0:N0} bytes copied to destination", bytesCopiedToDst);
                    }
                }
            }
            Console.WriteLine("dstThread exit");
        });

    midThread.Start();
    dstThread.Start();

    midThread.Join();
    dstThread.Join();
    Console.WriteLine("Done!");
}

That'll speed things up quite a bit because the read and write in the second thread can largely overlap the read and write in the first thread. Most likely, your limiting factor will be the speed of the disk that MID is stored on.

You can get some speed increase by doing asynchronous writes. That is, have the thread read a buffer and then fire off an asynchronous write. While that write is executing, the next buffer is being read. Just remember to wait for the asynchronous write to finish before starting another asynchronous write in that thread. So each thread looks like:

while (bytes left to copy)
    Read buffer
    wait for previous write to finish
    write buffer
end while

I don't know how much of a performance boost that will give you, because you're gated on the concurrent access to the MID file. But it's probably worth the effort to try.

I know that the synchronization code there will prevent the second thread from trying to read when it shouldn't. I think it will prevent a situation in which the second thread locks up because it's waiting for a signal after the first thread has exited. If there is any doubt, you can either have a ManualResetEvent that is used to say that the first thread is done, and use WaitHandle.WaitAny to wait on it and the AutoResetEvent, or you can use a timeout on the WaitOne, like this:

bytesAvailable.WaitOne(1000); // waits a second before trying again
Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • Thanks for the effort! I noticed you used `midFile.Position = totalBytes;` - is it a replacement for `midFile.Flush();` ? anyhow, unfortunately that's very slow, I tried that (just with "immediately flush" option or with explicit `midFile.Flush();`). – Tar Jul 07 '13 at 19:04
  • @Tal: Setting the position like that is equivalent to seeking. It might not actually cause the file system to flush its buffers to the device. The file system could serve the read request from its internal buffer. If you want to ensure that requests are serviced from the device, then the flush is necessary. If you do that it's going to be slow because you're limiting the speed of your program to the speed of the device. Still, with asynchronous writes, the total time to copy should be less than the time to copy both files sequentially. – Jim Mischel Jul 07 '13 at 23:57
  • Unfortunately it's not the case. Flushing, even with big chunks (10MB) slows down in about 20 times the speed of normal copy! that's a mystery to me, why there's such a great difference. – Tar Jul 08 '13 at 06:59