0

NOTE: My case is in the ecosystem of an old API that only work with Strings, no modern .NET additions.

So I have a strong need to have mutable string that has no allocations. String is updated every X ms so you can figure out how much garbage it can produce just in few minutes (StringBuilder is not even close to being relevant here at all). My current approach is to pre-allocate string of fixed size and mutate it via pinning, writing characters directly and either falling off silently of throwing when capacity reached.

This works fine. The allocated string is long-lived so eventually GC will promote it to Gen2 and pinning wont bother it that much, minimizing overhead. There are 2 major issues though:

  1. Because string is fixed, I have to pad it with \0 and while this worked fine so far with all default NET/MONO functionality and 3rd party stuff, no way telling how something else will react when string is 1024 in len, but last 100 are \0
  2. I cant resize it, because this will incur allocation. I could take one allocation once a blue moon but since string is fairly dynamic I cant be sure when it will try expand or shrink further. I COULD use "expand only" approach, this way I allocate only when expansion needed, however, this has disadvantages of padding overhead (if string expanded to be 5k characters, but next string is just 3k - 2k characters will be padded for extra cycles) and also memory extra usage. I'm not sure how GC will feel about mchuge, often pinned string in Gen2 and not in LOH. Another way would be pool re-useable string objects, however, this has higher memory and GC overhead + lookup overhead.

Since the target string has to live for quite some time, I was thinking about moving it into Unmanaged memory, via byte buffer. This will remove burden from GC (pinning penalty overhead) and I can re-size/re-allocate at less cost than in managed heap. What I'm having hard time to understand is - how can I possibly slice specific part of allocated unmanaged buffer and wrap it as a normal net string to use in managed space/code? Like, pass it to Console.WriteLine or some 3rd party library that draws UI label on screen and accepts string. Is this even doable?

P.S. As far as I know, the plan for NET5 (and to be finalized in NET6, I think) that you will no longer be able to mutate things like string (either blocked at runtime or undefined failure). Their solution seems to be POH which is essentially what I describe, with the same limitations.

KreonZZ
  • 175
  • 2
  • 10
  • 1
    "*GC will promote it to Gen2 and pinning wont bother it that much, minimizing overhead.*" -- that's not necessarily true. If the string was pinned while it was in gen0, then the GC might decide to avoid promoting it at all! If it is promoted, it's still going to get in the way of compaction. If it was pinned *while in gen2*, you might be right. – canton7 Apr 07 '21 at 09:08
  • 1
    I'd really encourage you not to go down the route of mutating strings -- there are many things which assume that strings don't mutate. Can't you get away with a char array? .NET strings start with the length, so you can't slice an arbitrary bit of a char array into a string, as it won't have the appropriate header on it. – canton7 Apr 07 '21 at 09:12
  • If you can, I'd keep a `char[]` in memory, which is mutated. When you need to turn that into a string in order to pass to `Console.WriteLine` etc (I assume you're not doing that every X ms!), use `string.Create` to copy the relevant part of your `char[]` into a new string. – canton7 Apr 07 '21 at 09:13
  • No, I have an old API/ecosystem that only accepts Strings. If I had an ability to use char array or Span this question wouldn't exist. All these suggestions you mentioned I've already many times, they are simply not relevant in this specific case. – KreonZZ Apr 07 '21 at 09:17
  • 1
    Right, but since you didn't mention them, people are going to suggest them. I'm not a mind-reader! – canton7 Apr 07 '21 at 09:17
  • @canton7 *">NET strings start with the length, so you can't slice an arbitrary bit of a char array into a string"* --- yes I know, what I meant is I have a block of memory I can arbitrary chose the position from which I start swriting data itself and where to put specific metadata. Ideally I put at the start String identifier data then construct pointer - I now have fairly valid string object in memory with pointer to it. *In theory*. – KreonZZ Apr 07 '21 at 09:19
  • @canton7 Yes, I need ready-to-be-used String every X ms, which is why I was talking about mutating it in the first place. If I use char array I will address only intermediate issue, but allocation still will be here when final string is needed. – KreonZZ Apr 07 '21 at 09:21

1 Answers1

0

how can I possibly slice specific part of allocated unmanaged buffer and wrap it as a normal net string to use in managed space/code

As far as I know this is not possible. .Net has their own way to define objects (object headers etc), you cannot treat some arbitrary memory region as a .net object. Pinning and mutating a string seem dangerous since strings are intended to be immutable, and some things might not work correctly (using the string as a dictionary key for example).

The correct way would be (as Canton7 mentions) to use a char[] buffer and Span<char> / Memory<char> for slicing the string. When passing to other methods you can convert a slice of the string to an actual string object. When calling methods like Console.WriteLine or UI methods, the overhead of allocating the string object will be irrelevant compared to everything else that is going on.

If you have old code that only accepts string you would either need to accept the limitations this entails, or rewrite the code to accept memory/span representations.

I would highly recommend profiling to see if it is an actual problem with frequent allocations. As long as the string fits in the small object heap (SOH, i.e. less than 87kb) and is not promoted to gen 2 the overhead might not be huge. Allocations on the SOH is fast, and the time to run a gen 0 GC does not scale directly with the amount allocated. So updating every few milliseconds might not be terrible. I would be more worried if you where talking about microseconds.

JonasH
  • 28,608
  • 2
  • 10
  • 23