0

I'm using this, slightly modified, to copy large files from a file share with the ability to continue copying, if the download was disrupted. It runs in a BackroudWorker and reports progress. This works fine, but I'd like to have the ability to write the current MD5 hash to disk (the current total, not once for each block) each time a block of file data is written to disk WITH MINIMAL ADDITIONAL OVERHEAD. If a partial file is discovered, I'd like to read the MD5 hash from file, and if it is identical to that of the partial file, continue copying. When the file has been copied completely, the MD5 hash in the file should be that of the completly copied file. I'd like to use that later to determine that the files in source and destination are identical. Thanks for any help!

This is my current copy method:

        public static bool CopyFile(List<CopyObjects> FileList, FSObjToCopy job, BackgroundWorker BW)
    {
        Stopwatch sw = new Stopwatch();
        long RestartPosition = 0;
        bool Retry = false;
        int BYTES_TO_READ = (0x200000)
        foreach (CopyObjects co in FileList)
        {
            FileInfo fi = co.file;
            FileInfo fo = null;

            if (fi.Directory.FullName.StartsWith($@"{Test_Updater_Core.ServerName}\{Test_Updater_Core.ServerTemplateRoot}"))
            {

                if (File.Exists(fi.FullName.Replace($@"{Test_Updater_Core.ServerName}\{Test_Updater_Core.ServerTemplateRoot}", $@"{ Test_Updater_Core.USBStore_Drive.driveInfo.Name.Replace("\\", "")}\{Test_Updater_Core.UsbTemplateRoot}")))
                {
                    fi = new FileInfo(fi.FullName.Replace($@"{Test_Updater_Core.ServerName}\{Test_Updater_Core.ServerTemplateRoot}", $@"{Test_Updater_Core.USBStore_Drive.driveInfo.Name.Replace("\\", "")}\{Test_Updater_Core.UsbTemplateRoot}"));
                    co.destination = co.destination.Replace($@"{Test_Updater_Core.USBStore_Drive.driveInfo.Name.Replace("\\", "")}\{Test_Updater_Core.UsbTemplateRoot}", $@"{Test_Updater_Core.LocalInstallDrive}\{Test_Updater_Core.LocalTemplateRoot}");
                    fo = new FileInfo($"{fi.FullName.Replace($@"{Test_Updater_Core.USBStore_Drive.driveInfo.Name.Replace("\\", "")}\{Test_Updater_Core.UsbTemplateRoot}", $@"{Test_Updater_Core.LocalInstallDrive}\{Test_Updater_Core.LocalTemplateRoot}")}{Test_Updater_Core.TempFileExtension}");
                }
            }
            
            //If a clean cancellation was requested, we do it here, otherwise the BackgroundWorker will be killed
            if (BW.CancellationPending)
            {
                job.Status = FSObjToCopy._Status.Complete;
                return false;
            }
            //If a pause is requested, we loop here until resume or termination has been signaled
            while (job.PauseBackgroundWorker == true)
            {
                Thread.Sleep(100);
                if (BW.CancellationPending)
                {
                    job.Status = FSObjToCopy._Status.Complete;
                    return false;
                }
                Application.DoEvents();
            }
            if (fo == null)
                fo = new FileInfo($"{fi.FullName.Replace(job.Source, co.destination)}{Test_Updater_Core.TempFileExtension}");

            if (fo.Exists)
            {
                Retry = true;
                RestartPosition = fo.Length - BYTES_TO_READ;
            }
            else
            {
                RestartPosition = 0;
                Retry = false;
            }
            if (RestartPosition <= 0)
            {
                Retry = false;
            }

            sw.Start();

            try
            {
                // Read source files into file streams
                FileStream source = new FileStream(fi.FullName, FileMode.Open, FileAccess.Read);
                // Additional way to write to file stream
                FileStream dest = new FileStream(fo.FullName, FileMode.OpenOrCreate, FileAccess.Write);
                // Actual read file length
                int destLength = 0;
                // If the length of each read is less than the length of the source file, read in chunks
                if (BYTES_TO_READ < source.Length)
                {
                    byte[] buffer = new byte[BYTES_TO_READ];
                    long copied = 0;
                    if (Retry)
                    {
                        source.Seek(RestartPosition, SeekOrigin.Begin);
                        dest.Seek(RestartPosition, SeekOrigin.Begin);
                        Retry = false;
                    }
                    while (copied <= source.Length - BYTES_TO_READ)
                    {
                        destLength = source.Read(buffer, 0, BYTES_TO_READ);

                        source.Flush();
                        dest.Write(buffer, 0, BYTES_TO_READ);
                        dest.Flush();
                        // Current position of flow
                        dest.Position = source.Position;
                        copied += BYTES_TO_READ;
                        job.CopiedSoFar += BYTES_TO_READ;
                        if (sw.ElapsedMilliseconds > 250)
                        {
                            job.PercComplete = (int)(float)((float)job.CopiedSoFar / (float)job.TotalFileSize * 100);

                            sw.Restart();
                            sw.Start();
                            job.ProgressCell.Value = job.PercComplete;
                            BW.ReportProgress(job.PercComplete < 100 ? job.PercComplete : 99);
                        }
                        if (BW.CancellationPending)
                        {
                            job.Status = FSObjToCopy._Status.Complete;
                            return false;
                        }
                        while (job.PauseBackgroundWorker == true)
                        {
                            Thread.Sleep(100);
                            if (BW.CancellationPending)
                            {
                                job.Status = FSObjToCopy._Status.Complete;
                                return false;
                            }
                            Application.DoEvents();
                        }
                    }
                    int left = (int)(source.Length - copied);
                    destLength = source.Read(buffer, 0, left);
                    source.Flush();
                    dest.Write(buffer, 0, left);
                    dest.Flush();
                    job.CopiedSoFar += left;
                }
                else
                {
                    // If the file length of each copy is longer than that of the source file, the actual file length is copied directly.
                    byte[] buffer = new byte[source.Length];
                    source.Read(buffer, 0, buffer.Length);
                    source.Flush();
                    dest.Write(buffer, 0, buffer.Length);
                    dest.Flush();
                    job.CopiedSoFar += source.Length;
                    job.PercComplete = (int)(float)((float)job.CopiedSoFar / (float)job.TotalFileSize * 100);
                    job.ProgressCell.Value = job.PercComplete;
                    BW.ReportProgress(job.PercComplete < 100 ? job.PercComplete : 99);
                }
                source.Close();
                dest.Close();
                fo.LastWriteTimeUtc = fi.LastWriteTimeUtc;
                if (File.Exists(fo.FullName))
                {
                    if (File.Exists(fo.FullName.Replace($"{Test_Updater_Core.TempFileExtension}", "")))
                    {
                        File.Delete(fo.FullName.Replace($"{Test_Updater_Core.TempFileExtension}", ""));
                    }
                    File.Move(fo.FullName, fo.FullName.Replace($"{Test_Updater_Core.TempFileExtension}", ""));
                }
                job.ProgressCell.Value = job.PercComplete;
                BW.ReportProgress(job.PercComplete);
            }
            catch (Exception ex)
            {
                MessageBox.Show($"There was an error copying:{Environment.NewLine}{fi}{Environment.NewLine}to:" +
                    $"{Environment.NewLine}{fo}{Environment.NewLine}The error is: {Environment.NewLine}{ex.Message}",
                    "Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
                job.Status = FSObjToCopy._Status.Error;
                return false;
            }
            finally
            {
                sw.Stop();
            }
        }
        return true;
    }
Daro
  • 1,990
  • 2
  • 16
  • 22
  • There was a line "md5InFile.TransformBlock(buffer,)" that should not have been there. My apologies! – Daro Dec 09 '21 at 20:53
  • The BackgoundWorker class is obsolete since 2012, fully replaces by async/await, Task.Run and IProgress. The [MD5](https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.md5?view=net-6.0) class has a [ComputeHashAsync(Stream,CancellationToken)](https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.hashalgorithm.computehashasync?view=net-6.0#System_Security_Cryptography_HashAlgorithm_ComputeHashAsync_System_IO_Stream_System_Threading_CancellationToken_) method which means all this code can be replaced with 4-5 lines – Panagiotis Kanavos Dec 09 '21 at 21:31
  • You can also process multiple files concurrently, something that's impossible with BGW – Panagiotis Kanavos Dec 09 '21 at 21:32
  • @Panagiotis Kanavos: Awesome! Care to share? – Daro Dec 09 '21 at 21:33
  • I also actually have a BGW for each batch of files. So each call to CopyFile(List FileList, FSObjToCopy job, BackgroundWorker BW) actually has multiple, related files (FileList), but can be called in parallel. – Daro Dec 09 '21 at 21:41
  • what is the actual problem you want to solve though? The article you link to is completely wrong. `FileStream` doesn't load everything into memory. `Stream` has both `CopyTo(Stream)` and `CopyToAsync(Stream)` that [always worked by using a small configurable buffer](https://github.com/microsoft/referencesource/blob/master/mscorlib/system/io/stream.cs#L218). In .NET Core even that was improved by using a shared buffer from an `ArrayPool.` The only thing that's not already available is progress reporting but the code is just a few lines. – Panagiotis Kanavos Dec 10 '21 at 08:58
  • As for checking whether the partial downloaded file is OK, that's not trivial and MD5 isn't the best choice. To get the actual hash you have to transform the final block as well. Most download managers just discard the latest bytes when downloading and resume eg 1-4KBs before the end. They only check hashes at the end. When using concurrent partial downloads, they check the hashes of the chunks. Cloud transfer utilities for AWS/Azure download chunks anyway so they can compare the chunk's hash with the hash and etag in the response's header. In fact they go even farther, also checking versions – Panagiotis Kanavos Dec 10 '21 at 09:01
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/240013/discussion-between-daro-and-panagiotis-kanavos). – Daro Dec 10 '21 at 13:02

1 Answers1

0

I decided to create Checksum files on the server that contain a series of checksums in each. As I copy the file, I add the checksums to an internal list, and compare them to the server list. If at some point, they do not match, I go back to the point where they were identical and start back up there. At the end of the copy job, I write the checksums from the internal list to disk, with the same name as of the server. If I'd like to check the integrity of a file, I can compare the server file to the local file and verify the checksums.

Daro
  • 1,990
  • 2
  • 16
  • 22