0

Recently I was asked in the interview, if the strings in C# can come to the LOH. The interviewer mentioned that there is some optimization in GC logic that splits a single massive string into several smaller ones, so this string never reaches LOH.

I didn't find the related info in MSDN articles: https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap  and  https://learn.microsoft.com/en-us/archive/msdn-magazine/2008/june/clr-inside-out-large-object-heap-uncovered

So are there any implications or optimizations in CLR regarding storing strings in LOH? Is it somehow related to string interning?

SvjMan
  • 589
  • 9
  • 18
  • 1
    i would have answered that this is a quite specific implementation detail that _extremely few_ people would know about and that will most likely have no significant performance in all but some few edge cases. knowing how c# works under the hood is a useful bonus skill, but not as important as knowing about the "higher-level" concepts of writing good software. – Franz Gleichmann Feb 24 '21 at 17:30
  • I was surprised to hear such a question on interview, that's why I want to double check if the interviewer was right in his opinion – SvjMan Feb 24 '21 at 17:35
  • 1
    I was giving an informal internal class on the GC. To create garbage, I had a `List` into which I added `count` instances of `new string('*', allocationSize)`. The `allocationSize` variable was actually a random number centered around a mean. When the mean got large enough (i.e., > 85k), I observed what I thought were LOH effects. Anyone who took part probably believed me. – Flydog57 Feb 24 '21 at 17:36
  • And, of course, the `string` object itself won't be LOH resident, the private buffer(s) it uses will be. – Flydog57 Feb 24 '21 at 17:44
  • 2
    There is no such an optimization. I wonder how the interviewer come to such an idea.... – Konrad Kokosa Feb 24 '21 at 17:56
  • @Flydog57 The string and it's buffer are the same thing, `string` is a special case of an expandable object, similar to arrays – Charlieface Feb 24 '21 at 18:10
  • @charlieface: Hmm. Never knew that! Thanks! – Flydog57 Feb 24 '21 at 20:12
  • @Flydog57 See https://github.com/dotnet/runtime/blob/eb03e0f7bc396736c7ac59cf8f135d7c632860dd/src/libraries/System.Private.CoreLib/src/System/String.cs#L25 for the managed side and the unmanaged is here https://github.com/dotnet/runtime/blob/master/src/coreclr/vm/object.h – Charlieface Feb 24 '21 at 20:31
  • 1
    Possible the interviewer had confused the implementation of `string` (which necessarily must be a single contiguous buffer) with that of `StringBuilder` (which in the past has been a single resizable buffer, similar to `List`, etc. but which currently is a linked list of buffers). All of these are implementation details that could change at any time, so isn't really much of a _practical_ programming problem as an _academic_ one. – Peter Duniho Feb 24 '21 at 21:11

1 Answers1

2

I think the interviewer wanted to hear about String Intern Pool also as known as LargeHeapHandleTable.

One of the mistake is to assume that interned string is located in String Intern Pool in LOH.

In reality, an interned string has a hash, which is located in LargeHeapHandleTable, and then it references to Small Object Heap(SOH) or Large Object Heap(LOH).

if an interned string more than 85kb the string will be located in LOH, in other cases it will be in 2 generation in SOH and would be stored until the application has finished.

[The example of interned string] https://i.stack.imgur.com/fD0WR.png

It is described in chapter 4 Pro .Net Memory Management by Kondrad Kokosa