2

I have to compute hash for huge payload, so I am using streams not to load all request content in memory. The question is what are the differences between this code:

using (var md5 = MD5.Create())
using (var stream = await authenticatableRequest.request.Content.ReadAsStreamAsync())
{
    return md5.ComputeHash(stream);
}

And that one:

using (var md5 = MD5.Create())  
using (var stream = new MemoryStream())
{
    await authenticatableRequest.request.Content.CopyToAsync(stream);
    stream.Position = 0;

    return md5.ComputeHash(stream);
}

I expect the same behavior internally, but maybe I am missing something.

Jevgenij Nekrasov
  • 2,690
  • 3
  • 30
  • 51
  • 4
    Your second version completely misses the point of using streams by forcing the entire contents to be held in memory. – Damien_The_Unbeliever Jul 31 '18 at 07:44
  • 1
    A MemoryStream is a Stream API on top of a byte buffer. Withtout specifying the `capacity` in the constructor, the buffer will have to be reallocated multiple times. The second snippet will end up using far more memory than simply reading the entire content as bytes – Panagiotis Kanavos Jul 31 '18 at 07:50

3 Answers3

4

The first version looks Ok, let the hasher handle the stream reading. It was designed for that.

ComputeHash(stream) will read blocks in a while loop and call TransformBlock() repeatedly.

But the second piece of code will load everything into memory, so don't do that:

using (var stream = new MemoryStream())
{
    await authenticatableRequest.request.Content.CopyToAsync(stream);
bommelding
  • 2,969
  • 9
  • 14
2

I expect the same behavior internally,

Why? I mean, in one case you must load all into memory (because guess what, you define a memory stream). In the other case not necessarily.

Alexander Farber
  • 21,519
  • 75
  • 241
  • 416
TomTom
  • 61,059
  • 10
  • 88
  • 148
2

The second snippet will not only load everything into memory, it will use more memory than HttpContent.ReadAsByteArrayAsync().

A MemoryStream is a Stream API over a byte[] buffer whose initial size is zero. As data gets written into it, the buffer has to be reallocated into a buffer twice as large as the original. This can create a lot of temporary buffer objects whose size exceeds the final content.

This can be avoided by allocating the maximum expected buffer size from the beginning by providing the capacity parameter to the MemoryStream() constructor.

At best, this will be similar to calling :

var bytes = authenticatableRequest.request.Content.ReadAsByteArrayAsync();
return md5.ComputeHash(bytes);
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
  • Small annotation: this holds for the current version of MemoryStream. But there is no reason why MemStream couldn't follow `List<>` in using chained blocks. It's probably just that the demand isn't so high. – bommelding Jul 31 '18 at 12:54
  • @bommelding List [doesn't use chained blocks](https://github.com/dotnet/corefx/blob/master/src/Common/src/CoreLib/System/Collections/Generic/List.cs#L26) even in .NET Core. As for MemoryStream, it's meant to be a wrapper over a buffer. The alternative is the new System.IO.Pipeline namespace – Panagiotis Kanavos Jul 31 '18 at 12:59
  • @bommelding The low-allocation alternative is the new System.IO.Pipeline namespace, explained by Marc Gravell explains in [his articles](https://blog.marcgravell.com/2018/07/pipe-dreams-part-1.html). – Panagiotis Kanavos Jul 31 '18 at 13:06