0
private const int RESULT_LENGTH = 10;

public static unsafe string Encode1(byte[] data)
{
    var result = new string('0', RESULT_LENGTH); // memory allocation

    fixed (char* c = result)
    {
        for (int i = 0; i < RESULT_LENGTH; i++)
        {
            c[i] = DetermineChar(data, i);
        }
    }

    return result;
}


public static string Encode2(byte[] data)
{
    var chars = new char[RESULT_LENGTH]; // memory allocation

    for (int i = 0; i < RESULT_LENGTH; i++)
    {
        chars[i] = DetermineChar(data, i);
    }

    return new string(chars); // again a memory allocation
}

private static char DetermineChar(byte[] data, int index)
{
    // dummy algorithm.
    return 'a';
}

Both methods encode a byte array according some specific algorithm to a string. The first creates a string and writes to the single chars using pointers. The second creates an array of chars and eventually uses that array to instantiate a string.

I know strings are immutable and that multiple string declarations can point to the same allocated memory. Also, according to this article, you should not use unsafe string modifications unless it is absolutely necessary.

My question: When is it safe to use 'unsafe string modifications' as used in the Encode1 sample code?

PS. I'm aware of newer concepts as Span and Memory, and the string.Create method. I'm just curious about this specific case.

Edit

Thank you for all your responses. Maybe the word 'safe' in my question was more confusing than it did any good. I didn't meant it as an opposite of the unsafe keyword but in a vernacular sense.

  • 1
    I think you're conflating the meaning of `unsafe` here. There's no such thing as "safe" `unsafe` code. The `unsafe` keyword is there to remind you that the usual safety mechanisms (such as buffer overrun protection) do not apply. – Robert Harvey Aug 03 '18 at 15:23
  • 1
    Its safe to use unsafe when the unsafe is safe - i.e. when you are certain you have not introduced any issues that would otherwise be dealt with by the CLR. – Alex K. Aug 03 '18 at 15:24
  • It is very naughty. But you'll get away with it, this is not an interned string. StringBuilder is the safe alternative. – Hans Passant Aug 03 '18 at 15:24
  • You should clarify what you mean by **safe**. There are any number of issues that you could introduce with `unsafe`. They could be serious or benign. – Dan Wilson Aug 03 '18 at 15:26

1 Answers1

0

Ultimately, the only time this is "safe" (in the vernacular sense, not in the unsafe sense) is when you own the string and it has not yet been exposed to any external code who may expect it to be immutable. The only time it is common to see this scenario is when you're constructing a new string and you can't just use the GetString methods on an Encoding - for example, because the source data is discontiguous and may span multiple Encoder steps.

So basically, the scenario shown in Encode1 where it allocates a new string with a known length, then immediately overwrites the character data is the only reasonable usage. Once the string is in the wild: leave it immutable.

However, if you even remotely can avoid it: I would. It definitely makes sense in the context of Encode1, but...

One scenario to be especially cautious off: interned strings (constants, literals, etc); you don't own these.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • IMHO, .NET would benefit from a `String` constructor or static factory function that could build a string given an array of `{Object, Start, Length}` structures, where each Object could either be a `String`, `StringBuilder`, `Char[]`, or a special flag object indicating that it should generate `Length` copies of `(Char)Start`. Such a feature would eliminate much of the need to build strings in more complicated fashion. Java could also benefit from such a feature, though for efficiency such a function may need to receive separate arrays for the objects and parameters. – supercat Aug 03 '18 at 16:26
  • @supercat : If I understand you correctly you mean something like the `string.Create(....)` method (.net core 2.1) Search for 'string.Create' in [this](https://msdn.microsoft.com/en-us/magazine/mt814808.aspx) post written by Stephen Toub. – Coen van den Munckhof Aug 03 '18 at 21:26
  • @CoenvandenMunckhof: I've not been following .NET for quite awhile. I've long thought it should support a `Span` concept, and I'm glad to see it's received one, though I would have favored a `ReadableSpan` base type with derived types `WritableSpan` and `ReadonlySpan`. The `String.Create` looks interesting though using arguments of type `ref TState` might have been nicer than `TState` in some cases. – supercat Aug 03 '18 at 22:21
  • @supercat in implementation: spans are structs (actually "ref structs"), so no "derived types" etc; instead, there are implicit conversion operators - so a `Span` can be implicitly treated as a `ReadOnlySpan` – Marc Gravell Aug 03 '18 at 23:13