3

I'm looking through some old (~2014) Rust code and I'm seeing this code block:

fn compile(self, func:&UncompiledFunction<'a>) -> &'a Val {
    unsafe {
        use std::raw::Repr;
        use std::mem::transmute as cast;
        let slice = self.repr();
        let ty = <&'a str as Compile<'a>>::get_type();
        let structure = Val::new(func, &ty);
        let offset_data = cast::<_, usize>(&slice.data) - cast::<_, usize>(&slice);
        let offset_len = cast::<_, usize>(&slice.len) - cast::<_, usize>(&slice);
        func.insn_store_relative(structure, offset_data, func.insn_of(mem::transmute::<_, isize>(slice.data)));
        func.insn_store_relative(structure, offset_len, func.insn_of(slice.len));
        structure
    }
}

According to the docs and this GitHub discussion std::raw::Repr and std::raw::Slice have been deprecated in favor of std::slice functions.

As someone with only a beginner's understanding of the std library I'm unsure how to translate these particular lines from the above block:

let slice = self.repr(); // `self` here is a `static str`
let offset_data = cast::<_, usize>(&slice.data) - cast::<_, usize>(&slice);
let offset_len = cast::<_, usize>(&slice.len) - cast::<_, usize>(&slice);

I was looking through the documentation for Repr with the hopes that I could produce an analogy with some function in the std::slice family, but nothing is immediately clear to me.

I'm hoping someone can explain to me what exactly Repr does (in different language) and what a more updated approach might be.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Matt
  • 3,508
  • 6
  • 38
  • 66
  • `use std::mem::transmute as cast;` Ugh. Please don't do that. – trent Sep 28 '18 at 17:15
  • 1
    What is the type of `self` in your example? For what it's worth, the `repr()` method was simply an alias for `std::mem::transmute_copy()`, which is still available today. – Sven Marnach Sep 28 '18 at 17:15
  • @SvenMarnach a `static str` – Matt Sep 28 '18 at 17:18
  • @trentcl This code interfaces with `libjit`. This low-level sort of development is _really_ new to me. I know `transmute` is a scary thing to use, but considering the FFI, is there a safe/cleaner way of doing it? – Matt Sep 28 '18 at 17:20
  • 2
    It's not that you shouldn't use `transmute` (although that doesn't really look necessary here either), just that you shouldn't rename it to `cast`. It saves a couple keystrokes at the expense of making everybody who ever reads the code say, "huh?" – trent Sep 28 '18 at 17:22
  • @trentcl Gotcha. Thanks! – Matt Sep 28 '18 at 17:23
  • I think more context is required. This looks like part of the implementation of a typemap, or something reflective like that. It's using the offset of the fields of a slice reference, which is pretty sketchy. I wonder if Code Review would be a better place for this. – trent Sep 28 '18 at 17:35
  • 1
    I don't think he layout of a slice in memory is guaranteed in any way by Rust. You should probably use your own type instead of `&'static str`. If this is not possible for some reason, you can use `offset_data = 0` and `offset_len = std::mem::size_of<*const u8>()`. This will break if the internal layout of a slice is changed by Rust, but so will any other solution to get these offsets. – Sven Marnach Sep 28 '18 at 17:42
  • 3
    @trentcl this post on CR would not be [on-topic](https://codereview.stackexchange.com/help/on-topic) because: "_For licensing, moral, and procedural reasons, we cannot review code written by other programmers. We expect you, as the author, to understand why the code is written the way that it is._" – Sᴀᴍ Onᴇᴌᴀ Sep 28 '18 at 17:45
  • @trentcl Yeah I just realised. There would have to be some inverse of `from_raw_parts()`, but returning pointers, even for the size. – Peter Hall Sep 28 '18 at 18:34

1 Answers1

3

For x of type &[T] or &str:

  • The replacement for x.repr().data is x.as_ptr().
  • The replacement for x.repr().len is x.len().
  • The replacement for transmuting from std::raw::Slice back to &[T] or &str is std::slice::from_raw_parts (and optionally std::str::from_utf8_unchecked).

However what this code does not just access the pointer and the length, it’s taking the address of those fields in order to compute their offset, presumably to later do some unsafe/unchecked memory reads or writes.

The unhelpful answer is don’t do this. std::raw::Slice was removed precisely because we didn’t want to stabilize the exact memory layout of &[T] and &str. If this is possible at all, consider refactoring the code to not do these unchecked memory accesses but instead e.g. replace the whole string with std::str::from_utf8_unchecked(std::slice::from_raw_parts(new_pointer, new_len)).

The practical answer is that the memory layout is very unlikely to change, and you’ll probably be ok if you hard-code:

let offset_data = 0;
let offset_len = std::mem::size_of::<usize>();
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Simon Sapin
  • 9,790
  • 3
  • 35
  • 44
  • Thanks for this, Simon! I'll pore over it in a few hours and see if I understand it fully. As a bit of backstory, this code came from a Rust [jit library](https://github.com/TomBebbington/jit.rs) which was/is interfacing with `libjit`. Dunno if that means anything to you or changes your opinion on some things, but there you have it. Thanks again! – Matt Oct 01 '18 at 20:23
  • I’m not familiar with libjit, but I’m not surprised that a JIT would want to do such unchecked memory accesses. I suppose that replacing them entirely would be a big rewrite and you’ll likely want the practical answer. – Simon Sapin Oct 02 '18 at 04:45
  • Do you have any recommendations for a more idiomatic and safe way of messing around at this low of a level? I suppose with this interface you can only be so safe, but I'm relatively new to this realm of programming. – Matt Oct 02 '18 at 16:59
  • 1
    There is some inherent tension between safe v.s. low-level. The nature of a JIT is to generate machine code that is completely outside of Rust’s type checkers, for example. Rather than manipulating the individual fields of `&str` (the pointer and length), try to manipulate it a whole, an opaque type whose size is twice the size of a pointer. `.as_ptr()` and `.len()` access the components, and `from_raw_parts` creates a new value with potentially-modified components, but neither requires assuming the exact memory layout inside of `&str`. – Simon Sapin Oct 02 '18 at 20:40