2

I need to serialize some data into files. For the sake of memory efficiency, I want to use the default compact serializer of MessagePack (MsgPack), as it only serializes field values w/o their names. I also want to be able to make changes to the data structure in future versions, which obviously can't be done w/o also storing some meta/versioning information. I imagine the most efficient way to do it is to simply use some "header" field for that purpose. Here is an example:

pub struct Data {
    pub version: u8,
    pub items: Vec<Item>,
}
pub struct Item {
    pub field_a: i32,
    pub field_b: String,
    pub field_c: i16,  // Added in version 3
}

Can I do something like that in rmp-serde (or maybe some other crate?) - to somehow annotate that a certain struct field should only be taken into account for specific file versions?

at54321
  • 8,726
  • 26
  • 46
  • You mean you want previous versions of your application to somehow not fail at decoding the message when `Item` grew, just based on the `version` in `Data` ? – Denys Séguret Dec 15 '21 at 06:36
  • @DenysSéguret Yes, exactly. In my example, I imagine `field_c` should be initialized with some sort of a default value when `version < 3`. – at54321 Dec 15 '21 at 06:46
  • 1
    If you really need such size gain, you should probably consider branching before the deserialization, so that you deserialize into different set of structs. Because your current approach seems hard to deal with especially as adding fields is only one of the possible changes – Denys Séguret Dec 15 '21 at 06:58

1 Answers1

2

You can achieve this by writing a custom deserializer like this:

use serde::de::Error;
use serde::{Deserialize, Deserializer, Serialize};

#[derive(Serialize)]
pub struct Data {
    pub version: u8,
    pub items: Vec<Item>,
}

#[derive(Serialize)]
pub struct Item {
    pub field_a: i32,
    pub field_b: String,
    pub field_c: i16, // Added in version 3
}

impl<'de> Deserialize<'de> for Data {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        // Inner structs, used for deserializing only
        #[derive(Deserialize)]
        pub struct InnerData {
            version: u8,
            items: Vec<InnerItem>,
        }
        #[derive(Deserialize)]
        pub struct InnerItem {
            field_a: i32,
            field_b: String,
            field_c: Option<i16>, // Added in version 3 - note that this field is optional
        }

        // Deserializer the inner structs
        let inner_data = InnerData::deserialize(deserializer)?;        
        
        // Get the version so we can add custom logic based on the version later on
        let version = inner_data.version;

        // Map the InnerData/InnerItem structs to Data/Item using our version based logic
        Ok(Data {
            version,
            items: inner_data
                .items
                .into_iter()
                .map(|item| {
                    Ok(Item {
                        field_a: item.field_a,
                        field_b: item.field_b,
                        field_c: if version < 3 {
                            42 // Default value
                        } else {
                            // Get the value of field_c
                            // If it's missing return an error, since it's required since version 3
                            // Otherwise return the value
                            item.field_c
                                .map_or(Err(D::Error::missing_field("field_c")), Ok)?
                        },
                    })
                })
                .collect::<Result<_, _>>()?,
        })
    }
}

Short explanation how the deserializer works:

  • We create a "dumb" inner struct which is a copy of your structs but the "new" fields are optional
  • We deserialize to the new inner structs
  • We map from our inner to our outer structs using version-based logic
    • If one of the new fields is missing in a new version we return a D::Error::missing_field error
Jeroen Vervaeke
  • 1,040
  • 9
  • 20