18

I'm just asking why Rust decided to use &str for string literals instead of String. Isn't it possible for Rust to just automatically convert a string literal to a String and put it on the heap instead of putting it into the stack?

  • 2
    Theoretically, sure. But it would be waaaaay slower and what would be the advantage? – trent Aug 25 '20 at 05:05
  • 4
    "put it on the heap instead of putting it into the stack", I guess string literals are placed to the ro.data section. – MaxV Aug 25 '20 at 05:54
  • 2
    Re: the close vote, I don't agree that this question leads to opinion-based answers. There are very clear reasons for the design. – Peter Hall Aug 25 '20 at 10:36

2 Answers2

20

To understand the reasoning, consider that Rust wants to be a systems programming language. In general, this means that it needs to be (among other things) (a) as efficient as possible and (b) give the programmer full control over allocations and deallocations of heap memory. One use case for Rust is for embedded programming where memory is very limited.

Therefore, Rust does not want to allocate heap memory where this is not strictly necessary. String literals are known at compile time and can be written into the ro.data section of an executable/library, so they don't consume stack or heap space.

Now, given that Rust does not want to allocate the values on the heap, it is basically forced to treat string literals as &str: Strings own their values and can be moved and dropped, but how do you drop a value that is in ro.data? You can't really do that, so &str is the perfect fit.

Furthermore, treating string literals as &str (or, more accurately &'static str) has all the advantages and none of the disadvantages. They can be used in multiple places, can be shared without worrying about using heap memory and never have to be deleted. Also, they can be converted to owned Strings at will, so having them available as String is always possible, but you only pay the cost when you need to.

Paul
  • 7,836
  • 2
  • 41
  • 48
  • Why is it a problem that the value in `ro.data` cannot be dropped? Could Rust just pretend it is dropped and carry on or would that cause problems? (Edit: I'm actually wondering why str exists at all, and string literals seem to be an important part of the answer.) – BlackShift Dec 08 '20 at 12:36
  • 1
    The problem is that String owns the data and drops that data when it is dropped itself. It might be possible to check if the data it owns is in `ro.data` (though it might also be difficult cross platform) and then avoid the drop, but its making the implementation much more complicated. The types `String` and `str` have their equivalent in `Vec` and `[T]`. With rust's model of ownership and shared references you really need something like `&str`, not just because of string literals. – Paul Dec 08 '20 at 14:35
  • 1
    Also, consider that `String` owns and can modify its data, not really something you want to do with a string literal / thing in `ro.data`. – Paul Dec 08 '20 at 14:39
  • Thanks @Paul. I didn't know that even a non-mut String has to assume its data is writable (because it can be made mut by its owner). And string literals have to be read-only for performance reasons (because then they can stay in ro.data). So there have to be two types to represent strings. – BlackShift Dec 08 '20 at 15:27
  • Hi, I am a newbie. You said string literals has none of the disadvantages. Regarding reverse engineering the rust code, will string literals lead to easier to do reverse enginneering than `String`? I don't really know much about reverse engineering too, but I always want the program to be harder to reverse engineer. – sgon00 Mar 25 '22 at 14:51
  • @sgon00 I don't know - I'm not a reverse engineering expert. – Paul Mar 27 '22 at 09:58
11

To create a String, you have to:

  • reserve a place on the heap (allocate), and
  • copy the desired content from a read-only location to the freshly allocated area.

If a string literal like "foo" did both, every string would effectively be allocated twice: once inside the executable as the read-only string, and the other time on the heap. You simply couldn't just refer to the original read-only data stored in the executable.

&str literals give you access to the most efficient string data: the one present in the executable image on startup, put there by the compiler along with the instructions that make up the program. The data it points to is not stored on the stack, what is stack-allocated is just the pointer/size pair, as is the case with any Rust slice.

Making "foo" desugar into what is now spelled "foo".to_owned() would make it slower and less space-efficient, and would likely require another syntax to get a non-allocating &str. After all, you don't want x == "foo" to allocate a string just to throw it away immediately. Languages like Python alleviate this by making their strings immutable, which allows them to cache strings mentioned in the source code. In Rust mutating String is often the whole point of creating it, so that strategy wouldn't work.

user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • "You simply couldn't just refer to the original read-only data stored in the executable." Why not? I would assume that to be possible as long as the String is not mut. And if the data is to be mutated, then it has to be copied anyway. So it seems that either way using a String should work without penalty. What am I missing? – BlackShift Dec 08 '20 at 12:31
  • 1
    @BlackShift *Why not?* - Because `String` is guaranteed to refer to heap-allocated data. You can even convert it to [`Box`](https://doc.rust-lang.org/std/string/struct.String.html#method.into_boxed_str) and [`Vec`](https://doc.rust-lang.org/std/string/struct.String.html#method.into_bytes) without reallocation. *I would assume that to be possible as long as the String is not mut.* - there is no such thing as a non-mut `String` - as long as you own it, you can always [make it mut](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e3048d13d2fec870277fdb85d28aaa61). – user4815162342 Dec 08 '20 at 13:14
  • Thank you @user4815162342. I did not know that you can make immutable variables mutable. I guess it makes sense, because an important reason for immutability/consts is to ensure that different parts of a program do not mangle data that other parts rely on. Rust solves that already. – BlackShift Dec 08 '20 at 14:17
  • `let mut s = s` [reuses the same memory location](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=4bfd18e0cc0135c7381c17c23ccd629f), so it indeed effectively makes the original String mut (instead of e.g. copying the data). So things will break if the non-mut String could refer to read-only data, something I didn't expect, thanks again. – BlackShift Dec 08 '20 at 15:09
  • @BlackShift It must necessarily reuse the same memory location because Rust's moves are always bitwise copies of the struct itself - there are no move (or copy) constructors that could duplicate pointed-to data. Given its public API, a `String` is bound to be implemented as triple of (pointer, capacity, length), and moving it just copies those three values and marks the old ones as dead, so the compiler doesn't try to `Drop` them. – user4815162342 Dec 08 '20 at 15:15
  • 1
    Yeah I figured that must be what was going on, so I wanted to proof it (and share the proof here for future readers). – BlackShift Dec 08 '20 at 15:42
  • I went looking for functions that modify `str` in place, to see what would happen if those are called on a string literal, Apparently these do exist, e.g. `make_ascii_uppercase()`, which requires a `&mut str`. But the only way I've been able to create a `&mut str` is to copy a `&str` through a heap-allocated structure like a `String` or a `Box`. Because `let mut s = s;` does not work on `&str` because 'it is behind an & reference'. So it does seem impossible to call such modify-in-place functions on a string literal. – BlackShift Dec 08 '20 at 20:09