3

I'm working on download and then MD5 check to ensure the download is successful. I have the following code which should work, but isn't the most efficient - especially for large files.

        using (var client = new System.Net.WebClient())
        {
            client.DownloadFile(url, destinationFile);
        }

        var fileHash = GetMD5HashAsStringFromFile(destinationFile);
        var successful = expectedHash.Equals(fileHash, StringComparison.OrdinalIgnoreCase);

My concern is that the bytes are all streamed through to disk, and then the MD5 ComputeHash() has to open the file and read all the bytes again. Is there a good, clean way of computing the MD5 as part of the download stream? Ideally, the MD5 should just fall out of the DownloadFile() function as a side effect of sorts. A function with a signature like this:

string DownloadFileAndComputeHash(string url, string filename, HashTypeEnum hashType);

Edit: Adds code for GetMD5HashAsStringFromFile()

    public string GetMD5HashAsStringFromFile(string filename)
    {
        using (FileStream file = File.Open(filename, FileMode.Open, FileAccess.Read, FileShare.Read))
        {
            var md5er = System.Security.Cryptography.MD5.Create();
            var md5HashBytes = md5er.ComputeHash(file);
            return BitConverter
                    .ToString(md5HashBytes)
                    .Replace("-", string.Empty)
                    .ToLower();
        }
    }
davidpricedev
  • 2,107
  • 2
  • 20
  • 34
  • 2
    Are you asking if there's a built-in method to do that? Or are you asking us how to _write_ such a method? And you might want to show us the code of GetMD5HashAsStringFromFile. – John Saunders May 15 '15 at 21:58
  • I don't have the whole answer for you, but see [ComputeHash](https://msdn.microsoft.com/en-us/library/xa627k19.aspx) and [OpenRead](https://msdn.microsoft.com/en-us/library/ms144209.aspx). – John Saunders May 15 '15 at 22:02
  • And look at [WebClient.DownloadData](https://msdn.microsoft.com/en-us/library/xz398a3f(v=vs.110).aspx) to get a byte array from the download – Gabriel GM May 15 '15 at 22:03
  • @JohnSaunders I've added the code for the `GetMD5HashAsStringFromFile` function. I was hoping that there would be something in the framework that already did this. And if not, what direction to go for a good clean solution. – davidpricedev May 19 '15 at 16:22
  • @GabrielGM - using the DownloadData function will be very hard on memory requirements - especially for the large (1+GB) files where a more efficient solution really matters. – davidpricedev May 19 '15 at 16:24
  • @dave - Totally agree. There is a good answer already taking that into account. – Gabriel GM May 19 '15 at 17:33

3 Answers3

9

Is there a good, clean way of computing the MD5 as part of the download stream? Ideally, the MD5 should just fall out of the DownloadFile() function as a side effect of sorts.

You could follow this strategy, to do "chunked" calculation and minimize memory pressure (and duplication):

  1. Open the response stream on the web client.
  2. Open the destination file stream.
  3. Repeat while there is data available:
    • Read chunk from response stream into byte buffer
    • Write it to the destination file stream.
    • Use the TransformBlock method to add the bytes to the hash calculation
  4. Use TransformFinalBlock to get the calculated hash code.

The sample code below shows how this could be achieved.

public static byte[] DownloadAndGetHash(Uri file, string destFilePath, int bufferSize)
{
    using (var md5 = MD5.Create())
    using (var client = new System.Net.WebClient())
    {
        using (var src = client.OpenRead(file))
        using (var dest = File.Create(destFilePath, bufferSize))
        {
            md5.Initialize();
            var buffer = new byte[bufferSize];
            while (true)
            {
                var read = src.Read(buffer, 0, buffer.Length);
                if (read > 0)
                {
                    dest.Write(buffer, 0, read);
                    md5.TransformBlock(buffer, 0, read, null, 0);
                }
                else // reached the end.
                {
                    md5.TransformFinalBlock(buffer, 0, 0);
                    return md5.Hash;
                }
            }
        }
    }
}
Alex
  • 13,024
  • 33
  • 62
1

If you're talking about large files (I'm assuming over 1GB), you'll want to read the data in chunks, then process each chunk through the MD5 algorithm, and then store it to the disk. It's doable, but I don't know how much of the default .NET classes will help you with that.

One approach might be with a custom stream wrapper. First you get a Stream from WebClient (via GetWebResponse() and then GetResponseStream()), then you wrap it, and then pass it to ComputeHash(stream). When MD5 calls Read() on your wrapper, the wrapper would call Read on the network stream, write the data out when it's received, and then pass it back to MD5.

I don't know what problems would await you if you try and do this.

Vilx-
  • 104,512
  • 87
  • 279
  • 422
0

Something like this.

byte[] result;
using (var webClient = new System.Net.WebClient())
{
    result = webClient.DownloadData("http://some.url");
}

byte[] hash = ((HashAlgorithm)CryptoConfig.CreateFromName("MD5")).ComputeHash(result);
Sergey
  • 3,214
  • 5
  • 34
  • 47
  • With large files this would mean reading the whole thing in memory first. That might be even less efficient than writing it out to disk (if at all possible). – Vilx- May 15 '15 at 22:09
  • What about download the file as stream of bytes, manageable amount of equal size blocks at a time, and for each batch hash the contents and combine the result. http://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/Merkle-Damgard_hash_big.svg/400px-Merkle-Damgard_hash_big.svg.png – Sergey May 15 '15 at 22:19
  • Well, yes, that's the general idea, but how do you do that without re-implementing MD5 by yourself? – Vilx- May 15 '15 at 22:21
  • Looking at implementation of the MD5 I think it's doable, you could potentially use the implementation but method that split byte blocks from the stream http://rosettacode.org/wiki/MD5/Implementation#C.23 – Sergey May 15 '15 at 22:27