CopyToAsync vs ReadAsStreamAsync for huge request payload

Question

I have to compute hash for huge payload, so I am using streams not to load all request content in memory. The question is what are the differences between this code:

using (var md5 = MD5.Create())
using (var stream = await authenticatableRequest.request.Content.ReadAsStreamAsync())
{
    return md5.ComputeHash(stream);
}

And that one:

using (var md5 = MD5.Create())  
using (var stream = new MemoryStream())
{
    await authenticatableRequest.request.Content.CopyToAsync(stream);
    stream.Position = 0;

    return md5.ComputeHash(stream);
}

I expect the same behavior internally, but maybe I am missing something.

Your second version completely misses the point of using streams by forcing the entire contents to be held in memory. — Damien_The_Unbeliever, Jul 31 '18 at 07:44
A MemoryStream is a Stream API on top of a byte buffer. Withtout specifying the `capacity` in the constructor, the buffer will have to be reallocated multiple times. The second snippet will end up using far more memory than simply reading the entire content as bytes — Panagiotis Kanavos, Jul 31 '18 at 07:50

bommelding · Accepted Answer · 2018-07-31T08:26:21.273

4

The first version looks Ok, let the hasher handle the stream reading. It was designed for that.

ComputeHash(stream) will read blocks in a while loop and call TransformBlock() repeatedly.

But the second piece of code will load everything into memory, so don't do that:

using (var stream = new MemoryStream())
{
    await authenticatableRequest.request.Content.CopyToAsync(stream);

edited Jul 31 '18 at 08:26

answered Jul 31 '18 at 07:46

bommelding

2,969
9
14

score 2 · Answer 2 · edited Jan 22 '20 at 10:05

2

I expect the same behavior internally,

Why? I mean, in one case you must load all into memory (because guess what, you define a memory stream). In the other case not necessarily.

edited Jan 22 '20 at 10:05

Alexander Farber

21,519
75
241
416

answered Jul 31 '18 at 07:44

TomTom

61,059
10
88
148

score 2 · Answer 3 · answered Jul 31 '18 at 08:05

2

The second snippet will not only load everything into memory, it will use more memory than HttpContent.ReadAsByteArrayAsync().

A MemoryStream is a Stream API over a byte[] buffer whose initial size is zero. As data gets written into it, the buffer has to be reallocated into a buffer twice as large as the original. This can create a lot of temporary buffer objects whose size exceeds the final content.

This can be avoided by allocating the maximum expected buffer size from the beginning by providing the capacity parameter to the MemoryStream() constructor.

At best, this will be similar to calling :

var bytes = authenticatableRequest.request.Content.ReadAsByteArrayAsync();
return md5.ComputeHash(bytes);

answered Jul 31 '18 at 08:05

Panagiotis Kanavos

120,703
13
188
236

Small annotation: this holds for the current version of MemoryStream. But there is no reason why MemStream couldn't follow `List<>` in using chained blocks. It's probably just that the demand isn't so high. – bommelding Jul 31 '18 at 12:54
@bommelding List [doesn't use chained blocks](https://github.com/dotnet/corefx/blob/master/src/Common/src/CoreLib/System/Collections/Generic/List.cs#L26) even in .NET Core. As for MemoryStream, it's meant to be a wrapper over a buffer. The alternative is the new System.IO.Pipeline namespace – Panagiotis Kanavos Jul 31 '18 at 12:59
@bommelding The low-allocation alternative is the new System.IO.Pipeline namespace, explained by Marc Gravell explains in [his articles](https://blog.marcgravell.com/2018/07/pipe-dreams-part-1.html). – Panagiotis Kanavos Jul 31 '18 at 13:06

CopyToAsync vs ReadAsStreamAsync for huge request payload

3 Answers3