4

Please explain the Serde rc feature

Opt into impls for Rc<T> and Arc<T>. Serializing and deserializing these types does not preserve identity and may result in multiple copies of the same data. Be sure that this is what you want before enabling this feature.

Serializing a data structure containing reference-counted pointers will serialize a copy of the inner value of the pointer each time a pointer is referenced within the data structure. Serialization will not attempt to deduplicate these repeated data.

Deserializing a data structure containing reference-counted pointers will not attempt to deduplicate references to the same data. Every deserialized pointer will end up with a strong count of 1.

Why does this feature flag exist and why isn't it default behaviour? What does it mean by

Serializing and deserializing these types does not preserve identity and may result in multiple copies of the same data

I know that it is related to Serde issue 194. The last message of the issue says

If you want to make sure you don't accidentally end up with a derived impl containing an rc, open a clippy issue.

Does the feature flag exist to catch unexpected usages of an Rc struct?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
NebulaFox
  • 7,813
  • 9
  • 47
  • 65
  • The sentence "may result in multiple copies of the same data" seems understandable to me — what do you find confusing about it? – Shepmaster Mar 09 '20 at 16:15
  • Why does it end up with multiple copies? For instance, if it deseralizes a string and then wraps it in an `Rc` struct, there would only be one copy. Where does the multiple copies come from? I can understand it for serializing, as it takes a copy to put in a file buffer. – NebulaFox Mar 09 '20 at 16:18

1 Answers1

4

As stated in Serde issue 194, the drawbacks of the implementation of deserializing to Rc or Arc are:

  • Potentially increased memory usage
  • Equality comparison that relies on comparison of address breaks
  • Interior mutability is not reflected in copies

This is echoed in the feature flag documentation:

Serialization will not attempt to deduplicate these repeated data.

Deserializing a data structure containing reference-counted pointers will not attempt to deduplicate references to the same data.

The usual point of an Rc or Arc is to share data. This sharing doesn't happen when deserializing to a struct containing Rc or Arc. In this example, 5 completely distinct Rc<str>s are created with no relation to each other, even though they all have the same content:

use std::{rc::Rc, ptr};

fn main() {
    let json = r#"[
        "alpha",
        "alpha",
        "alpha",
        "alpha",
        "alpha"
    ]"#;

    let strings = serde_json::from_str::<Vec<Rc<str>>>(json).unwrap();
    
    dbg!(ptr::eq(strings[0].as_ref(), strings[1].as_ref()));
}
[src/main.rs:14] ptr::eq(strings[0].as_ref(), strings[1].as_ref()) = false

This is especially bad when you have an Rc<RefCell<_>> or other type with interior mutability, as you might expect that modifying one of the items modifies all of the items.

See also:

Community
  • 1
  • 1
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • 1
    Interesting. You example is the exactly the behaviour I would expect - 5 completely distinct `Rc`s. `Rc` does not guarantee uniqueness. If I was looking for uniqueness than I would want to use a Set-like structure. Equality would then be comparing the data `Rc` struct is pointing to, not the pointer itself. – NebulaFox Mar 09 '20 at 16:27
  • @NebulaFox that means that you are ok with serialization and deserialization [not being a round-trip](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f22d1685bea7119ff78b2440ca9b6e87). – Shepmaster Mar 09 '20 at 16:33
  • I understand now. Going one direction doesn't incur a memory penality, but going back and forth does. – NebulaFox Mar 09 '20 at 16:37