2

I am trying to write a function that loads the standard output of a command-line utility (image-magick) into a member of a struct. I figure that since images can be may MB, I might as well avoid copies as much as possible.

/// An image data container used internally.
/// Images are 8-bit single channel for now.
pub struct Image<'a> {
    /// Width of the image in pixels.
    pub width: u32,
    /// Height of the image in pixels.
    pub height: u32,
    /// The buffer containing the image data, as a slice.
    pub pixels: &'a [u8],
}

// Load a file by first using imageMagick to convert it to a .pgm file.
fn load_using_magick<'a>(path: Path) -> Image<'a> {
    use std::io::Command;

    let output:IoResult<ProcessOutput> = Command::new("convert")
        .arg("-format pgm")
        .arg("-depth 8")
        .arg(path)
        .arg("-")
        .output();
    let output_unwrapped:ProcessOutput = match output {
        Ok(o) => o,
        Err(e) => panic!("Unable to run ImageMagick's convert tool in a separate process! convert returned: {}", e),
    };

    let bytes: &[u8] = match output_unwrapped.status.success() {
        false => panic!("signal or wrong error code from sub-process?"),
        true => output_unwrapped.output.as_slice(),
    };
    // Note, width and height will eventually get parsed out of "bytes"
    // and the returned Imaeg.pixels will point to a slice of the (already)
    // heap allocated memory.  I just need to figure out ownership first...
    Image{width:10,height:10,pixels:bytes}

}

The big chunk of heap that I would like to keep around was allocated by the standard library (or maybe the kernel?) when the standard library std::io::Process::Command::output() is called.

Compilation is failing with the borrow checker:

src/loaders.rs:41:17: 41:40 error: `output_unwrapped.output` does not live long enough
src/loaders.rs:41         true => output_unwrapped.output.as_slice(),
                                  ^~~~~~~~~~~~~~~~~~~~~~~
src/loaders.rs:21:51: 48:2 note: reference must be valid for the lifetime 'a as defined on the block at 21:50...
src/loaders.rs:21 fn load_using_magick<'a>(path: Path) -> Image<'a> {
src/loaders.rs:22     use std::io::Command;
src/loaders.rs:23 
src/loaders.rs:24     let output:IoResult<ProcessOutput> = Command::new("convert")
src/loaders.rs:25         .arg("-format pgm")
src/loaders.rs:26         .arg("-depth 8")
                  ...
src/loaders.rs:21:51: 48:2 note: ...but borrowed value is only valid for the block at 21:50
src/loaders.rs:21 fn load_using_magick<'a>(path: Path) -> Image<'a> {
src/loaders.rs:22     use std::io::Command;
src/loaders.rs:23 
src/loaders.rs:24     let output:IoResult<ProcessOutput> = Command::new("convert")
src/loaders.rs:25         .arg("-format pgm")
src/loaders.rs:26         .arg("-depth 8")

This makes some sense to me; whatever is owning the chunk of data I'm trying to keep around is going out of scope, leaving a dangling pointer in the struct I returned by value. So how do I actually transfer ownership of the memory region to my struct before I return it?

I already read the Rust Lifetimes Manual.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Andrew Wagner
  • 22,677
  • 21
  • 86
  • 100
  • My code is here if it's easier to just check it out: https://github.com/drewm1980/rust-omr/commit/8f7f558f0b7cab171b690c74b752c2bf859403c6 (run "cargo test") – Andrew Wagner Nov 25 '14 at 18:40
  • 1
    Shouldn't you be using a boxed value? (I might be out of date or plain wrong) – Logan Nov 25 '14 at 22:14
  • @logan, delnan suggested below sticking the Vec into my struct. I ~think that counts as a boxed type. I also just tried sticking the Vec into a Box and sticking ~that in my struct. Neither of these allow me to keep around a slice pointing to the sub-array I care about into the struct. – Andrew Wagner Nov 26 '14 at 14:51

1 Answers1

5

Your signature claims that, whatever lifetime the caller desires, you can return an Image<'that_lifetime>. In effect, you are claiming to return Image<'static>, which contains a &'static [u8], i.e. a pointer to a byte slice that lives for the entire duration of the program execution. Obviously, the ProcessOutput from which you actually take the byte slice does not live that long.

The ownership of the memory region belongs to output_unwrapped, more specifically output_unwrapped.output (a Vec<u8>). You need to keep one of those two alive. The easiest option is to give the ownership to Image: Make pixels a Vec<u8>, which you move out of output_unwrapped. Of course, this has lasting impact on all code handling Images, so this is a design issue. Does it make sense for the image to own the contents? Does lack of borrowing cause problems for any intended use cases? (If you can't answer these questions, you can always go ahead and try it, though you may only find out weeks later.)

One alternative is to abstract over ownership, i.e. allow both vectors and slices. This is even supported in the standard library via the std::borrow::Cow<[u8]> type. You just have to decide if it's worth the extra trouble — especially since pixels is, for some reason (probably not a good one?), public.

trent
  • 25,033
  • 7
  • 51
  • 90
  • Thanks for the response! I'm still very much in the design phase, not to mention, the "learning rust" phase. Your solution sounds like it should work. Do you know if it's possible for a slice to own a Vec while still only pointing to the part of it I care about? Or should I stick both the Vec and the pixels in my Image struct? The Vec contains the image header. – Andrew Wagner Nov 25 '14 at 20:54
  • 1
    @AndrewWagner Having some owned data and a borrowed pointer to the same struct doesn't work too well. What I'd do instead (though I'm not experienced enough to strongly recommend this) is take the Vec and store the range of indices I care about, then construct a slice `v[start..end]` on the fly when actually working on the pixels. (This can, of course, be packaged into a nice little helper type.) Well, that or biting the bullet and copying the parts I care about out of the Vec. Zero-copy algorithms give me warm, fuzzy feelings but sometimes they're not worth it. –  Nov 25 '14 at 20:59
  • I think if I'm going to bother using a language with a high mental overhead for memory management, I ought to learn how to express the more efficient version straight off. If "don't needlessly copy the huge block of memory" is really that hard to express in Rust, then I'll probably try to file some sort of bug report about it. i.e. maybe Rust needs something like a global autorelease pool for data you know will hang around for the lifetime of your program. – Andrew Wagner Nov 26 '14 at 13:01
  • Do know know any other reasonable options for getting my struct to own the data loaded in from stdout of the external program? Maybe a Box<[u8]> or Rc<> or Gc<> or something? I might later want multiple threads to work on this data concurrently anyways... though I suppose this data will be owned by my main() context, and threads would borrow non-mutable references... – Andrew Wagner Nov 26 '14 at 13:06
  • @AndrewWagner This is quickly spiraling out of scope for a single question, but in short: The problem is that the memory for the whole output are put in a single allocation, but you want to take ownership of only a part of the block. At the very least, something needs to preserve the knowledge of the start (and real size) of the block, to be able to free it. `ProcessOutput` collects the output in a `Vec` so there's no way around the `Vec` as long as you use `ProcessOutput`. For sharing between multiple threads, you should use `Arc<..>` (assuming read-only access for *all* threads). –  Nov 26 '14 at 13:33
  • Ok, I tried your suggestion to just stick the Vec in my struct and it compiles, so while I think there has to be a better way, I'm going to accept it as a valid solution to the problem. Thanks! – Andrew Wagner Nov 26 '14 at 14:28
  • I did also try sticking ~both the vec and a slice into the struct, and as you predicted, I still ran into lifetime trouble. I also tried sticking the Vec into an enum variant so I could abstract over the type owning the buffer for me, but that didn't work either; maybe enums don't own their data; I asked that in a separate question. – Andrew Wagner Nov 26 '14 at 14:43