0

https://dlang.org/library/std/string/to_stringz.html

In my understanding it could not work:

toStringz creates an array on the stack and returns its pointer. After toStringz returns, the array on the stack is discarded and the pointer becomes invalid.

But I suppose it indeed works because of being a part of the standard library. So what is wrong in my understanding of the above?

Another related question:

What does scope return in the signature of this function mean? I visited https://dlang.org/spec/function.html but found no scope return there.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
porton
  • 5,214
  • 11
  • 47
  • 95

2 Answers2

2

It does not create an array on the stack. If necessary, it allocates a new string on the GC heap.

The implementation works by checking the existing string for a zero terminator - if it deems it possible to do so without a memory fault (which is guesses by checking the alignment of the last byte. If it is a multiple of four, it doesn't risk it, but if it is not, it reads one byte ahead of the pointer because fault boundaries are on multiple of four intervals).

If there is a zero byte already there, it returns the input unmodified. That's what the return thing in the signature means - it may return that same input. (This is a new feature that just got documented... yesterday. And it isn't even merged yet: https://github.com/dlang/dlang.org/pull/2536 But the stdlib docs are rebuilt from the master branch lol)

Anyway, if there isn't a zero byte there, it allocates a new GC'd string, copies the existing one over, appends the zero, and returns that. That's why the note in the documentation warns about the C function keeping it. If the C function keeps it beyond execution, it isn't the stack that will get it - it is the D garbage collector. D's GC cannot see memory allocated by C functions (unless specifically informed about it) and will think the string is unreferenced next time it runs and thus free it, leading to a use-after-free bug.

The scope keyword in the signature is D's way of checking this btw: it means the argument will only be used in this function's scope (though the combination of return means it will only be used in this function's scope OR returned through this function). But that's on toStringz's input - the C function you call probably doesn't use that D language restriction and this it would not be automatically caught.

So to sum up the attributes again:

scope - the argument will not leave the function's scope. Won't be assigned to a global or an external structure, etc.

return - the argument might be returned by the function.

return scope - hybrid of the above; it will not leave the function's scope EXCEPT through the return value.

Adam D. Ruppe
  • 25,382
  • 4
  • 41
  • 60
  • But it returns `immutable(char)*`. I thought pointers (unlike arrays) are not GC-managed! – porton Dec 30 '18 at 18:39
  • 1
    Both arrays and pointers may or may not be GC managed. It depends on what they point to, so you can't tell from the type signature (you can reassign the same variable to differently managed things, like `int* a = new int; /* gc */ int b; a = &b; /* not gc, points to stack */ a = cast(int*) malloc(int.sizeof); /* not gc, points to C memory */;` Similarly, you can have arrays on the stack, or into malloc'd memory, or anything else. (A D array is just a `struct { size_t length; T* ptr; }` object internally). – Adam D. Ruppe Dec 30 '18 at 19:42
0

Adam, what’s the recommended way for guarding against the use-after-free bug?

Could there possibly be some development built into the compiler whereby it squeals loudly if your D code is not implementing the recommended guard? Or, far better, could D automatically create a pointer somewhere referencing the copy of the string plus terminating 0 ?

Would everyone go mad with rage if the D compilers over-allocated strings and arrays of xchar by 1 byte, plus maybe a bit more for alignment reasons, and put a zero in the byte(s) after the end? That way tostringz would be trivial, the horribly expensive memory block-copying would be gone, the bug would vanish, and everything would be C-compatible.

Cecil Ward
  • 597
  • 2
  • 13