1

I understand that generational garbage collection improves performance, since

  1. Any object will have to be moved at most twice in non-Gen2 collections, and Gen2 collections are rare.
  2. If the system is performing a Gen0 collection and an object (Gen1 or Gen2) hasn't been written since the last Gen0 collection, the system won't have to scan that object to tag any references therein (since they'll all be Gen1 or Gen2). Likewise if the system is performing a Gen1 collection, it can ignore any object not written since the last Gen1 collection. Since most of the work of garbage-collection is scanning objects, reducing the scanning time is a big win.

I'm curious, though, what performance advantage there could be to omitting large objects from Gen1 garbage collection? Large objects aren't relocated even when they're scanned by the garbage collector, and I would expect that Gen1 collections will still have to scan their contents unless or until two consecutive Gen1 collections occur without intervening object writes.

Is there some performance advantage I'm not seeing?

supercat
  • 77,689
  • 9
  • 166
  • 211

1 Answers1

0

I'm curious, though, what performance advantage there could be to omitting large objects from Gen1 garbage collection?

There are two things -

First, large objects (large enough to be on the LOH) tend to have longer lifetimes. Short-lived large objects are rare, and in the cases where that's needed, you can typically reuse them. By not scanning, you're effectively avoiding scans that, nearly always, will result in keeping the objects anyways. This tends to give you a win in perf.

Also, in order to effectively treat objects in Gen1, and get many of the advantages provided by having a Gen1, they'd need to be able to compact. A large advantage of the generational collector is you're compacting the newest allocations, which tends to help keep the memory pressure lower as fragmentation tends to be better behaved. Large objects in Gen1 would cause performance or memory issues, as they'd either require compacting (right now, large objects are not compacted, since it's expensive to "move" them) or they'd cause extra fragmentation to creep up.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • The LOH is known to cause fragmentation problems leading to OutOfMemoryExceptions in certain circumstances. I understand it's expensive to compact, but as the developer of a program at least give me the option to turn it on. – DavidN Jun 24 '11 at 17:50
  • @DavidN: I'm having LOH fragmentation problems, and I suspect that some objects that should be short-lived (but of course aren't) are probably contributing. To be sure, compacting the LOH before throwing OutOfMemoryException would make fragmentation much less of a problem, but I'm curious about the rationale for the policy. – supercat Jun 24 '11 at 17:58
  • @supercat: Best article I've found on this topic. http://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/ – DavidN Jun 24 '11 at 18:07
  • @DavidN: Perhaps the rationale is that if one only collects LOH objects during an L2 collection, one will be collecting more objects at once, and thus generating bigger holes. I can somewhat see that, but waiting until an application has generated a gig worth of junk before cleaning it up means the stuff one is keeping will have nice consolidated holes within it, but will be spread out over a gig. Making space available more often would cause holes to be filled in sooner, but the fragmentation wouldn't sprawl so much through memory. – supercat Jun 24 '11 at 18:10
  • @DavidN: I'd read that article (agree it's nice one, btw--thanks), but I don't think I've seen any explanation of why LOH collections are deferred until Gen2. It seems to me the real effect of deferring LOH collection is to increase the likelihood that large objects which should be short-lived survive long enough to have smaller longer-lived objects placed after them. – supercat Jun 24 '11 at 18:32
  • @Reed Copsey: I can understand a desire to allocate the large objects in a separate heap. What I don't understand is what benefit is gained by not freeing up the memory for a large object if at the time of its first Gen0 or Gen1 collection no references to it exist? By my understanding of how GC works, an object must be scanned during the first Gen0 collection after its creation since it may hold references to other Gen0 objects. Likewise it must be scanned during the first Gen1 collection. If the system is going to have to scan large objects during the Gen0 and Gen1 collections... – supercat Jun 24 '11 at 18:37
  • @Reed Copsey: ...why not use the opportunity to free them up if they're no longer used? If a large object which has survive a Gen0 collection but hasn't yet survived an Gen1 collection holds the only surviving references to 50,000 objects, I would think it would be cheaper for a Gen1 collection to free the object than to tag all 50,000 references therein and forcing them to survive until level 2. Is there some optimization I don't know about that allows .net to skip the work of tagging those objects? – supercat Jun 24 '11 at 18:41
  • @DavidN: Yes, I agree to some extent. Ideally, I'd love to have a `GC.CompactLargeObjects()` method or something like that, but it's not there now. – Reed Copsey Jun 24 '11 at 19:13
  • @supercat: The problem is that this would cause a lot of extra scanning in Gen0 (which Gen0 gets scanned very, very frequently), and even in Gen1, when, most of the time, a "large object" is a "long lived object" - including the references held internally within it. The overall effect is that it would cause extra scanning *in most cases*. I agree there are times when this works against you, but if you accept the argument that large objects tend to be long lived objects, then the decision makes some sense. – Reed Copsey Jun 24 '11 at 19:17
  • @Reed Copsey: By what mechanism can .Net avoid scanning a large object during the first Gen0 or Gen1 collection after its instantiation? Unless .net has some trick I don't know about (entirely possible) it's going to have to scan all the objects within the large object on the first gen0 collection after its instantiation, and again on the first gen1 collection. If it doesn't scan through the large object, how can it know whether to keep alive any object to which no references exist outside the large one? – supercat Jun 24 '11 at 22:36
  • @supercat: It can't - any small objects within the large object will get found in the scan. It's more the large object itself doesn't have to get marked and dealt with... – Reed Copsey Jun 24 '11 at 22:37
  • @Reed Copsey: Why is tagging a large object more expensive than a small one? Relocation would be expensive (but gets skipped for large objects anyway). Deletion might be a little expensive (having to update the free-list and what-not) but would have to happen eventually in any case. Setting the flag bits should be cheap. So why should checking whether a 100K string is still alive be any more expensive than checking whether a 32-byte object is? – supercat Jun 24 '11 at 22:44
  • As of .NET451 you can now set a property to to have the LOH compacted during the next full GC. It then resets i.e. you need to set it every time you want the LOH compacted. https://msdn.microsoft.com/en-us/library/system.runtime.gcsettings.largeobjectheapcompactionmode(v=vs.110).aspx – Ben Hall Jul 21 '18 at 08:21