1

Actually I try to implement the following functionality in Rust.

I want to have a structure Node which has a vector to some other Node structures. In addition I have a master vector which keeps all the Node structures which have been instantiated.

The key point here is that the Nodes are allocated within a loop (i.e. an own scope) and the master vector keeping all the structures (or references to the structures) is declared outside the loop which is in my opinion a 0815 use case.

After a lot of trying I came up with this code which still does not compile. Actually I tried it with just &Node and alternatively with RefCell<&Node>, both do not compile.

struct Node<'a> {
    name: String,
    nodes: RefCell<Vec<&'a Node<'a>>>,
}

impl<'a> Node<'a> {

    fn create(name: String) -> Node<'a> {
        Node {
            name: name,
            nodes: RefCell::new(Vec::new()),
        }
    }
    
    fn add(&self, value: &'a Node<'a>) {
        self.nodes.borrow_mut().push(value);
    }

    fn get_nodes(&self) -> Vec<&'a Node> {
        self.nodes.take()
    }
}


// Later the code ...

    let mut the_nodes_ref: HashMap<String, RefCell<&Node>> = HashMap::new();
    let mut the_nodes_nodes: HashMap<String, &Node> = HashMap::new();

    // This works
    let no1_out = Node::create(String::from("no1"));
    let no2_out = Node::create(String::from("no2"));

    no1_out.add(&no2_out);
    no2_out.add(&no1_out);

    the_nodes_nodes.insert(no1_out.name.clone(), &no1_out);
    the_nodes_nodes.insert(no2_out.name.clone(), &no2_out);

    let no1_ref_out = RefCell::new(&no1_out);
    let no2_ref_out = RefCell::new(&no2_out);

    the_nodes_ref.insert(no1_out.name.clone(), no1_ref_out);
    the_nodes_ref.insert(no2_out.name.clone(), no2_ref_out);

    // This works not because no1 and no2 do not live long enough
    let items = [1, 2, 3];
    for _ in items {
        let no1 = Node::create(String::from("no1"));
        let no2 = Node::create(String::from("no2"));

        no1.add(&no2); // <- Error no2 lives not long enough
        no2.add(&no1); // <- Error no1 lives not long enough

        the_nodes_nodes.insert(no1.name.clone(), &no1);
        the_nodes_nodes.insert(no2.name.clone(), &no2);

        let no1_ref = RefCell::new(&no1);
        let no2_ref = RefCell::new(&no2);

        the_nodes_ref.insert(no1.name.clone(), no1_ref);
        the_nodes_ref.insert(no2.name.clone(), no2_ref);
    }

I kind of understand the problem, but I am wondering how to solve this problem. How can I allocate a structure within an separate scope (here the for loop) and then use the allocated structures outside the for loop. I mean it is a common use case to allocate a structure within a loop and use it later outside of the loop.

Somehow I have the feeling that the missing link is to tell the Rust Compiler via the lifetime parameters, that the references should also stay alive outside the for loop but I have no idea how to do that. But maybe this is also not the correct way to do it ....

Actually another key point here is that I want that the Nodes have references to the other Nodes and not copies of the Nodes. The same is true for the master vector, this vector should have references to the allocated Nodes and not copies of the Nodes.

Horowitzathome
  • 359
  • 3
  • 13

1 Answers1

2

All this boils down to the answer to a single question: which entity in the program should own the Node values?

Right now main() owns the values, and you know this because everything else in the program only has &Node, which is a reference to something owned by something else. This is why the loop variant fails, because no1 and no2 are the owned values, but they are destroyed at the end of each loop iteration, so you have dangling references in your maps.

One way to solve this problem is to have a collection own the values. However, due to Rust's borrowing rules, you will not be able to modify the collection once you start giving out references, because that requires borrowing the collection mutably. So you'd have to create all your nodes up front, put them in the collection, and then start giving references out to the other nodes. This is the most efficient way to solve the problem, but is inflexible and binds the lifetime of all the nodes together. In real code, nodes may come and go so having them share a lifetime is impractical.

The classic solution to this problem is shared ownership via Rc, but that comes with its own set of problems where you have nodes referencing each other. In that case, you might leak node objects even if you drop them from the global collection, because they still reference each other.

This is where weak references come in, which allow you to refer to another value maintained by an Rc but not prevent it from being collected. However, a value in an Rc can't be mutated if two or more references exist to the same value, so adding the weak references to the nodes requires interior mutability via RefCell.

Let's put all this together:

use std::collections::HashMap;
use std::rc::{Rc, Weak};
use std::cell::RefCell;

struct Node {
    name: String,
    nodes: RefCell<Vec<Weak<Node>>>,
}

impl Node {
    fn new(name: String) -> Self {
        Node { name, nodes: RefCell::new(Vec::new()) }
    }
    
    fn name(&self) -> &String {
        &self.name
    }
    
    fn add(&self, value: Weak<Node>) {
        self.nodes.borrow_mut().push(value);
    }

    fn get_nodes(&self) -> Vec<Rc<Node>> {
        // Return strong references.  While we are doing this, clean out
        // any dead weak references.
        let mut strong_nodes = Vec::new();
        
        self.nodes.borrow_mut().retain(|w| match w.upgrade() {
            Some(v) => {
                strong_nodes.push(v);
                true
            },
            None => false,
        });
        
        strong_nodes
    }
}

fn main() {
    let mut the_nodes_nodes: HashMap<String, Rc<Node>> = HashMap::new();

    let items = [1, 2, 3];
    for _ in items {
        let no1 = Rc::new(Node::new(String::from("no1")));
        let no2 = Rc::new(Node::new(String::from("no2")));
        
        // downgrade creates a new Weak<T> for an Rc<T>
        no1.add(Rc::downgrade(&no2));
        no2.add(Rc::downgrade(&no1));
        
        for n in [no1, no2] {
            the_nodes_nodes.insert(n.name().clone(), n);
        }
    }
}

The nodes are strongly-referenced by the_nodes_nodes, which will keep them alive, but we can dispense further Rc or Weak instances that refer to the same node without needing to manage lifetimes nearly as strictly.

Note that when a Node is destroyed because it's removed from the map, existing Weak references to that node will no longer be valid. You must invoke upgrade() on Weak references which will give you back an Rc only if the Node value is still alive. The get_nodes() method wraps up this logic by returning an IntoIterator that strongly references only the nodes that are still alive.


For the sake of completeness, here is what the non-Rc option would look like. There is a helper struct Nodes to hold the map.

use std::collections::HashMap;
use std::cell::RefCell;

struct Node<'a> {
    name: String,
    nodes: RefCell<Vec<&'a Node<'a>>>,
}

impl<'a> Node<'a> {
    fn new(name: String) -> Self {
        Node { name, nodes: RefCell::new(Vec::new()) }
    }
    
    fn name(&self) -> &String {
        &self.name
    }
    
    fn add(&self, value: &'a Node<'a>) {
        self.nodes.borrow_mut().push(value);
    }
    
    fn get_nodes(&self) -> Vec<&'a Node<'a>> {
        self.nodes.borrow().clone()
    }
}

struct Nodes<'a> {
    nodes: HashMap<String, Node<'a>>,
}

impl<'a> Nodes<'a> {
    fn new<T: IntoIterator<Item=String>>(node_names: T) -> Self {
        let mut nodes = HashMap::new();
        
        for name in node_names {
            nodes.insert(name.clone(), Node::new(name));
        }
        
        Self { nodes }
    }
    
    fn get_node(&'a self, name: &String) -> Option<&'a Node<'a>> {
        self.nodes.get(name)
    }
}

fn main() {
    let nodes = Nodes::new(["n1".to_string(), "n2".to_string()]);
    
    let n1 = nodes.get_node(&"n1".to_string()).expect("n1");
    let n2 = nodes.get_node(&"n2".to_string()).expect("n2");
    
    n1.add(n2);
    n2.add(n1);
}

Note that we have to create all nodes in advance. Creating a node requires borrowing the HashMap mutably, which we can't do while there is a reference to a value in the map. The Nodes type makes this clear by requiring an iterator of node names to create in its constructor function; adding new nodes later isn't permitted by the API.

We cannot obtain a mutable reference to a node while we hold a reference to any other node, so this approach also requires interior mutability (RefCell) for each node's node list and simply doesn't provide an API for obtaining a mutable reference to a node.

cdhowie
  • 158,093
  • 24
  • 286
  • 300
  • Thank you for this explanation, I think I would have not been able to figure this out. Actually now I have another question ... How can I get or iterate over the nodes of a node? I.e. when I call get_nodes(), how do I access the nodes? – Horowitzathome Feb 20 '22 at 19:33
  • 1
    @Horowitzathome It's just a vector, iterate over it as you would a vector. Note that I've updated the code in the original example, plus added a second example showing how you would do this without `Rc`. – cdhowie Feb 20 '22 at 20:27
  • In regard to your first solution, how would I do it if I first try to get the nodes from the hash map resp. insert them (or one of them) if it/they does/do not exist and then call the functions to add the nodes to each other? In this case I have the problem that I have borrowed the hash map the_nodes_nodes twice. I.e. I need not just one node entry from the hash map but both at the same time. This question is related to the question I asked here: https://stackoverflow.com/questions/71243406/how-to-modify-an-entry-in-a-hashmap-after-more-than-one-entry-has-been-added-to – Horowitzathome Feb 24 '22 at 03:34
  • @Horowitzathome Aha, yes, I also answered that question. :) That explains the motivation. The answer is the same as I gave in that question: insert the nodes first but don't retain references, then go back for the references once that's done. You can't hold on to even immutable references to values in a map while you add new nodes to the map, because adding new nodes can cause the map to need to grow its internal allocation, which requires moving all the existing values, invalidating existing references. – cdhowie Feb 24 '22 at 03:54
  • @Horowitzathome However, if you're using the `Rc` version of my code above, then you _can_ hold on to the `Rc` _values_. – cdhowie Feb 24 '22 at 04:10
  • Hmm, somehow I am not able to first add the nodes and the get and modify them. See this playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=d8d808dd71be181d74ac604f99af8c08 First I add the nodes to the hash map. I also keep no local references. Then I get the one node and then the second to add the second to the first. I get the usual error, that I have a second mutual borrow.I understand the error, but have no idea what to do so that I can add one node to the other after I have added the nodes to the hash map. – Horowitzathome Feb 24 '22 at 05:27
  • 1
    @Horowitzathome Two problems: (1) `Node::add_node` doesn't need to borrow the `Rc` mutably. Change `&mut` to `&`. (2) `Node::add_node` borrows `self` immutably, so there is no reason to use `get_mut()` when fetching nodes in the last statement of `main()` -- use `get()` instead. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=571d3419cab432981b2faed056b20295 – cdhowie Feb 24 '22 at 07:10