-1

I am trying to write a trait that allows for gzip encode/decode of arbitrary (de)serializable structs. My primary use case is to persist some stateful struct on disk via a clean API. To that end, any time a struct S implements serde's Serialize and Deserialize, and our trait is in scope, a gzipped + serialized copy of it should be read/written by/to anything that is Read/Write on demand.

For example:

A trait that describes the API for reading/writing of some (de)serializable struct.

use flate2::read::GzDecoder;
use flate2::write::GzEncoder;
use serde::{Serialize, Deserialize};
use rmp_serde::{Serializer};
use std::io::{Read, Write};

pub type Result<T, E = std::io::Error> = std::result::Result<T, E>;

pub trait ReadWriteState<S: Serialize + Deserialize> {
    /// Write the given persistent state to a stream.
    fn write_state(&mut self, state: &S) -> Result<usize>;
    /// Write the given persistent state to a stream.
    fn read_state(&mut self) -> Result<S>;
}

Blanket implementation of the ReadWriteState for (de)serializable states and via anything that is std::io::Read and std::io::Write simultaneously.

impl<S, T> ReadWriteState<S> for T 
where
    S: Serialize + Deserialize, // This doesn't work because of lifetimes in serde Deserializer.
    T: Read + Write
{
    /// Serialize the state into messagepack and then
    /// GzEncode it before sending to the output stream.
    fn write_state(&mut self, state: &S) -> Result<usize> {
        let mut buf = Vec::new();
        state
            .serialize(&mut Serializer::new(&mut buf))
            .unwrap_or_else(|_| panic!("Could not serialize data."));

        let mut e = GzEncoder::new(Vec::new(), Compression::default());
        e.write_all(&buf)?;

        let compressed_bytes = e.finish()?;
        let length = compressed_bytes.len();

        self.write_all(&compressed_bytes)?;
    }

    /// Decode the gzipped stream into msgpack and then further deserialize it into the generic state struct.
    fn read_state(&mut self) -> Result<S, serde_json::Error> {
        let mut decoder = GzDecoder::new(self);
        let mut buf = Vec::new(); // The buf is created here so it is owned by this function scope.
        decoder.read_to_end(&mut buf).expect("Couldn't read the gzipped stream to end.");
        serde_json::from_slice::<'de, S>(&buf) // (*)

        // This is what I expect should work fine 
        // but the borrow checker complains that 
        // `buf` doesn't live long enough.
    }

}

A sample stateful struct that is (de)serializable by serde_derive macros.

// Now suppose we have some struct that is Serialize as
// well as Deserialize.

#[derive(Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct FooMap<K, V> 
where
    // For a moment, suppose Deserialize doesn't need a lifetime.
    // To compile, it should look more like Deserialize<'a> for some defined
    // lifetime 'a, but let's ignore that for a moment.
    K: Clone + Hash + Eq + Serialize + Deserialize,
    V: Eq + Serialize + Deserialize
{
    pub key: K,
    pub value: V
}

The convenient disk persistence API for our FooMap in action.

// Now I should be able to write gzipped + messagepacked FooMap to file.

pub fn main() {
    let foomap = FooMap {
        key: "color",
        value: "blue"
    };
    let mut file = std::fs::File::create("/tmp/foomap.gz").expect("Could not create file.");
    let bytes_written = file.write_state(&foomap).expect("Could not write state.");
    println!("{} bytes written to /tmp/foomap.gz", bytes_written);

    let mut file = std::fs::File::open("/tmp/foomap.gz").expect("Could not open file.");
    let recovered: FooMap<&str, &str> = file.read_state().expect("Could not recover FooMap.");
    assert_eq!(foomap, recovered);
}

You may notice a few problems with the code above. The one that I'm aware of is the lack of lifetime annotations for Deserialize when used as a trait bound. Serde has a beautiful write up regarding Deserializer lifetimes.

I've put together a Playground that tries to address the lifetime issue and by doing so, I've come across another compiler error (*) that seems pretty weird to me, given this situation.

I'm really confused at what point did I end up taking a wrong path and how to correct it. I would really appreciate if someone could help me understand the mistakes I've made in this implementation and how could I prevent it from happening again.

Aalekh Patel
  • 1
  • 1
  • 1
  • Maybe you should look at how other serde implementations are done. You can find a list of them at https://serde.rs/#data-formats. – jthulhu Apr 18 '22 at 08:19
  • Thank you. I was made aware of my mistake of not understanding the meaning of lifetimes in the Deserializer context. The whole lifetime management that arises when "what you're serializing from may share raw memory with what it is being serialized into" went completely over my head in my first couple of reads but now it seems I have a slight better understanding than before. – Aalekh Patel Apr 18 '22 at 09:13

1 Answers1

3

Your code compiles if you use DeserializeOwned as a bound instead of Deserialize<'de>.

serde's lifetime page has this to say about DeserializeOwned (highlights by me):

This means "T can be deserialized from any lifetime." The callee gets to decide what lifetime. Usually this is because the data that is being deserialized from is going to be thrown away before the function returns, so T must not be allowed to borrow from it. For example a function that accepts base64-encoded data as input, decodes it from base64, deserializes a value of type T, then throws away the result of base64 decoding. [...]

This exactly matches your use case, since buf is dropped before the function returns.

jonasbb
  • 2,131
  • 1
  • 6
  • 25
  • **UPDATE** If you'd like to see a [compiling Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=cd3a307a4f7f2047e91811cd7fe01d72), I've updated it for those who want to see the correct state that this answer talks about. – Aalekh Patel Apr 18 '22 at 09:15