7

I have a singleton object that process requests. Each request takes around one millisecond to be completed, usually less. This object is not thread-safe and it expects requests in a particular format, encapsulated in the Request class, and returns the result as Response. This processor has another producer/consumer that sends/receives through a socket.

I implemented the producer/consumer approach to work fast:

  • Client prepares a RequestCommand command object, that contains a TaskCompletionSource<Response> and the intended Request.
  • Client add the command to the "request queue" (Queue<>) and awaits command.Completion.Task.
  • A different thread (and actual background Thread) pulls the command from the "request queue", process the command.Request, generates Response and signals the command as done using command.Completion.SetResult(response).
  • Client continues working.

But when doing a small memory benchmark I see LOTS of these objects being created and topping the list of most common object in memory. Note that there is no memory leak, the GC can clean everything up nicely each time triggers, but obviously so many objects being created fast, makes Gen 0 very big. I wonder if a better memory usage may yield better performance.

I was considering convert some of these objects to structs to avoid allocations, specially now that there are some new features to work with them C# 7.1. But I do not see a way of doing it.

  • Value types can be instantiated in the stack, but if they pass from thread to thread, they must be copied to the stackA->heap and heap->stackB I guess. Also when enqueuing in the queue, it goes from stack to heap.
  • The singleton object is truly asynchronous. There is some in-memory processing, but 90% of the time it needs to call outside and going through the internal producer/consumer.
  • ValueTask<> does not seem to fit here, because things are asynchronous.
  • TaskCompletionSource<> has a state, but it is object, so it would be boxed.
  • The command also jumps from thread to thread.
  • Reciclying objects only works for the command itself, its content cannot be recycled (TaskCompletionSource<> and a string).

Is there any way I could leverage structs to reduce the memory usage or/and improve the performance? Any other option?

Vlad
  • 802
  • 1
  • 10
  • 23
  • Side note: why using `Queue` which is *not* thread safe? Try `BlockingCollection` or `ConcurrentQueue` instead. – Dmitry Bychenko Oct 15 '18 at 14:12
  • Could you produce a *relevant code*, please? It's difficult to digest code description only. – Dmitry Bychenko Oct 15 '18 at 14:14
  • If you are creating lots of short-lived Gen0 objects and little else, Gen0 collection is pretty cheap. It's the non-Gen0 stuff that costs time and execution cycles. – Flydog57 Oct 15 '18 at 14:18
  • Sorry I was so carried away because the memory thing I forgot the real objective, to improve the performance. @DmitryBychenko having just one consumer and one producer I saw no performance improvement in using a ConcurrentQueue. I use simple `lock' with Wait and Pulse. – Vlad Oct 15 '18 at 14:32

1 Answers1

26

Value types can be instantiated in the stack, but if they pass from thread to thread, they must be copied to the stackA->heap and heap->stackB I guess.

No, that's not at all true. But you have a deeper problem in your thinking here:

Immediately stop thinking of structs as living on the stack. When you make an int array with a million ints, you think those four million bytes of ints live on your one-million-byte stack? Of course not.

The truth is that stack vs heap has nothing whatsoever to do with value types. Instead of "stack and heap", start saying "short term allocation pool" and "long term allocation pool". Variables that have short lifetimes are allocated from the short term allocation pool, regardless of whether that variable contains an int or a reference to an object. Once you start thinking about variable lifetime correctly then your reasoning becomes entirely straightforward. Short-lived things live in the short term pool, obviously.

So: when you pass a struct from one thread to another, does it ever live "on the heap"? The question is nonsensical because values are not things that live on the heap. Variables are things that are storage; variables store value.

So: Is it the case that turning classes into structs will improve performance because "those structs can live on the stack"? No, of course not. The relevant difference between reference types and value types is not where they live but how they are copied. Value types are copied by value, reference types are copied by reference, and reference copies are the fastest copies.

I see LOTS of these objects being created and topping the list of most common object in memory. Note that there is no memory leak, the GC can clean everything up nicely each time triggers, but obviously so many objects being created fast, makes Gen 0 very big. I wonder if a better memory usage may yield better performance.

OK, now we come to the sensible part of your question. This is an excellent observation and it is one which is testable with science. The first thing you should do is to use a profiler to determine what is the actual burden of gen 0 collections on the performance of your application.

It may be that this burden is not the slowest thing in your program and in fact it is irrelevant. In that case, you will now know to concentrate your efforts on the real problem, rather than chasing down memory allocation problems that aren't real problems.

Suppose you discover that gen 0 collections really are killing your performance; what can you do? Is the answer to make more things structs? That can work, but you have to be very careful:

  • If the structs themselves contain references, you've just pushed the problem off one level, you haven't solved it.
  • If the structs are larger than reference size -- and of course they almost always are -- then now you are copying them by copying the entire struct rather than copying a reference, and you've traded a GC time problem for a copy time problem. That might be a win, or a loss; use science to find out which it is.

When we were faced with this problem in Roslyn, we thought about it very carefully and did a lot of experiments. The strategy we went with was in general not to move things onto the stack. Rather, we identified how many small, short-lived objects there were active in memory at any one time, of each type -- using a profiler -- and then implemented a pooling strategy on those objects. You need a small object, you take it out of the pool. When you're done, you put it back in the pool. What happens is, you end up with O(number of objects active at any one time) in the pool, which quickly gets moved into the gen 2 heap; you then greatly lower your collection pressure on the gen 0 heap while increasing the cost of comparatively rare gen 2 collections.

I'm not saying that's the best choice for you. I'm saying that we had this same problem in Roslyn, and we solved it with science. You can do the same.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • Many thanks! I think I was looking to this the wrong way. – Vlad Oct 16 '18 at 13:41
  • "Variables that have short lifetimes are allocated from the short term allocation pool". Could you please clarify how is a short lifetime array with a million ints allocated? Is the variable that references the array allocated in the short term allocation pool, while the array is allocated in the long term allocation pool? What happens when the life of the variable ends? – Luca Cremonesi Oct 16 '18 at 14:45
  • @LucaCremonesi: An array is a collection of variables; we do not know how long those variables are going to live, so they have to be allocated on the garbage-collected long-term pool. If we have a local variable that holds a reference to that array, and that local variable is short lived, then yes, that local is allocated off the short term pool. When the life of the variable ends, the storage is reclaimed, and there is one fewer GC root of the array. – Eric Lippert Oct 16 '18 at 16:28
  • @LucaCremonesi: Note that C# is permitted to make the lifetimes of short-lived variables longer or shorter as it sees fit, provided that doing so does not violate other rules of C#. If you have a short-lived local that holds a reference, that local is not required to be a root of the GC for the entire time that control is in the scope of the variable; it might be reclaimed early. And similarly, lifetimes may be extended past the point where control leaves the scope. Do not conflate scope and lifetime; they are only weakly connected. – Eric Lippert Oct 16 '18 at 16:29
  • @LucaCremonesi: Also note that in the specific case of a million-int array, in practice it would be allocated off of a special long-term pool specifically for large objects, called, surprise, the Large Object Heap. Large objects are assumed to be *extremely* long-lived in C#, so they get collected less often and moved less often. – Eric Lippert Oct 16 '18 at 16:31
  • "An array is a collection of variables". Why is an array of ints a collection of variables? You mean that myArray[0], myArray[1]... should be considered individual variables? – Luca Cremonesi Oct 16 '18 at 18:12
  • "We do not know how long those variables are going to live". Why don't we? If the array is locally defined and it has a short lifetime (e.g. no closures), as I previously stated, we know that. – Luca Cremonesi Oct 16 '18 at 18:17
  • 2
    @LucaCremonesi: Of course they are individual variables. They contain values, and **they can vary**. We call variables variables because variables are *things that are able to vary*. You can use `a[0]` in *any context that requires a variable* -- you can put it on the left side of an assignment, you can make it a `ref` parameter, and so on. It's a variable! – Eric Lippert Oct 16 '18 at 18:45
  • @LucaCremonesi: We don't know that. Suppose a reference to the array is assigned to a field; now there are two references to the array and one of them lives longer. Suppose the array reference is passed to *any function whatsoever*; that function can *also* assign it to a field, and now it lives longer. You don't need just a lack of *closures*, you need a lack of *escapes of any alias of the reference*. – Eric Lippert Oct 16 '18 at 18:48
  • 1
    @LucaCremonesi: C# and the CLR could have been written to do conservative escape analysis, and could have chosen to put array element variables on the short term pool in the extremely tiny number of cases where conservative escape analysis says that it is safe to do so. But that optimization is extremely expensive and complex, and gives you a tiny savings that almost never pays off, so C# and the CLR do not in practice do it. – Eric Lippert Oct 16 '18 at 18:49
  • If you want a stack-only type, the only way I'm aware of to do so is via a `ref struct` declaration, introduced in C# 7.2. For example, [Span](https://msdn.microsoft.com/en-us/magazine/mt814808.aspx). `Span bytes = stackalloc byte[length];` is legal even without an unsafe context. Of course, needing a stack-only type only occurs in highly specialized, advanced scenarios. – Brian Oct 16 '18 at 21:16
  • @Brian: Right, there are several types in C# / the CLR that are Very Special Types that can only exist on the short-term pool. The undocumented ref-to-anything types that implement C-style variadics are another example. It will now be possible to make your own special types, which is... interesting. As you note, it's only relevant in very narrow scenarios, and I am pretty surprised that this made it to the top of the list of possible features to implement. – Eric Lippert Oct 16 '18 at 21:43
  • 1
    @EricLippert: C#7 seems to have focused on data movement/bucketing/performance, so this feature was a good fit. I think the main benefit of `ref struct` is to make libraries (e.g., the .Net Framework) faster. [Quoting the MSDN](https://docs.microsoft.com/en-us/dotnet/csharp/reference-semantics-with-value-types): "You may find that you don't often use these features in the code you write. However, these enhancements have been adopted in many locations in the .NET Framework. As more and more APIs make use of these features, you'll see the performance of your own applications improve." – Brian Oct 18 '18 at 13:13
  • @Brian: That's exactly right. The C# design process deliberately de-emphasizes "insider benefit only" features; for example, when I was on the tools for office team we had many feature requests for C# 3 that did not end up making it into the compiler until C# 4; features that are of particular use for compiler writers only tend to be delayed, and so on. The team quite rightly wants to prioritize more user-facing features. However in recent years many people who researched high-performance memory access features now work on the C# team! So these new features are not a coincidence. :-) – Eric Lippert Oct 18 '18 at 13:19
  • @Brian: I first worked on prototypes of some of these features eight or nine years ago; that's how long they took to make it from prototype quality to the actual top of the list of things to implement, to shipped. – Eric Lippert Oct 18 '18 at 13:20