2

Why is casting a string to an ubyte[] force to be unsafe in D?

And why is the same case using std.conv:to unsafe and nothrow?

I can very well understand that the opposite (casting ubyte[] to UTF-8) must be unsafe and throw but not this case?

Why can't I safely investigate these individual bytes?

Nordlöw
  • 11,838
  • 10
  • 52
  • 99

3 Answers3

6

string -> ubyte is disallowed in safe mode because that is casting away the immutability of string. cast(immutable(ubyte)[]) some_string is permitted. You can also look at the bytes of a string by doing foreach(char c; some_string) { /* look at c */ }.

to!(ubyte[])(some_string) worked for me, even in safe mode, though it can throw. The reason is because the appender helper function in std.array isn't properly marked nothrow; a phobos bug. If you add nothrow to std/array.d line 2662 (in dmd 2.064.2) it will then compile as nothrow too.

Adam D. Ruppe
  • 25,382
  • 4
  • 41
  • 60
5

@safe has nothing to do with UTF-8 vs ASCII. @safe has everything to do with memory safety. It's guaranteeing stuff like that you won't be operating on released or corrupt memory. Many types of casts are considered @system and unsafe, because they risk doing stuff that violates memory safety, and it requires that the programmer verify that the cast doesn't actually violate memory safety and mark the function as @trusted in order to tell the compiler that it's actually safe, making it useable in @safe code.

As for casting string to ubyte[], casting away immutable like that is circumventing the type system. It makes it so that you have to guarantee that your code isn't mutating the data, and if you don't, you've violated the compiler's guarantees, and you're going to have bugs. I'd suggest looking at

What is the difference between const and immutable in D?

Logical const in D

To make a long story short, don't cast away const or immutable unless you really need to, and you know what you're doing.

std.conv.to will dup an array if it has to in order to do a conversion, so to!(ubyte[])("hello world") is going to result in a new ubyte[] with the same values as "hello world", but it won't be the same array, and therefore won't violate the type system. If it had been to!(immutable(ubyte)[])("hello world"), then it could cast the array, but it can't as long as you're converting to a mutable array.

As for to!(ubyte[])("hello world") being @system and nothrow, that's an implementation issue. It should definitely be possible for it to be @safe, and I think that it can be nothrow, since it shouldn't need to decode the characters, but the underlying implementation doesn't support that, likely because much of the low level stuff that Phobos uses doesn't support @safe, pure, or nothrow yet, even when it should. The situation with that is improving (e.g. with dmd 2.064, format can now be pure under some circumstances), but there's still quite a ways to go.

As for "investigating the individual bytes," I don't see why you need to do any conversion at all. Just iterate over the string's elements. They're char, which is exactly the same size and signedness as ubyte. But if you really want to operate on the string as an array of ubyte without having to allocate, then just cast to the same constness as string - e.g.

auto arr = cast(immutable(ubyte)[])"hello world";

or

auto arr = to!(immutable(ubyte)[])("hello world");
Community
  • 1
  • 1
Jonathan M Davis
  • 37,181
  • 17
  • 72
  • 102
  • Casting isn't necessarily unsafe, only certain kinds are disallowed. I think casting pointer types is always considered @system, but going between integer types (which includes char, byte, etc.) I'm pretty sure is always allowed. – Adam D. Ruppe Nov 11 '13 at 02:24
  • @AdamD.Ruppe That's true, though it's frequently true that casts are unsafe (particularly with generic code). It should come down to which casts the compiler can guarantee memory safety for and which it can't. Regardless, I updated my answer accordingly. – Jonathan M Davis Nov 11 '13 at 02:39
2

See also std.string.representation.

Andrei Alexandrescu
  • 3,214
  • 19
  • 17