1

I'm going through Mazes for Programmers and thought I'd try to do it in Rust (the original code is in Ruby).

One of the first things you do in the book is to implement a Cell class that includes links to other cells. Naively, I thought I'd be able to do something like this:

pub struct Cell {
    row: u32,
    column: u32,

    links: HashMap<Box<Cell>, bool>,
}

The idea being that each Cell in a maze knows what other cells it connects to, and that a Maze is just a collection of Cells. Unfortunately, this leads one into the doubly-linked-list problem of Rust's borrowing and ownership semantics, and I'm not sure how to get around this.

Specifically, I cannot figure out how to implement link here:

impl Cell {
    pub fn new (row: u32, column: u32) -> Cell {
        return Cell {
            row: row,
            column: column,
            links: HashMap::new(),
        };
    }

    pub fn link(&mut self, other: &mut Cell, bidi: bool) {
        self.links.insert(Box::new(other), true); // cannot get anything even remotely like this to work here.
    }
}

I took a look at the Rust Book's section on choosing your guarantees, which has some good info on one's respective choices of pointer-wrappers and abstractions, but I'm really not sure how to reason about what I'd like to do with the compiler.

I think that I want each Cell struct to maintain (via an embedded HashMap) a reference (in the non-explicit use of the term) to other Cells that it is connected to, but since that link will be bi-directional it seems to me that I'm going to end up using Rc or something similar.

It's also possible that I should just take a step away from the literal translation of the Ruby code in the book and think about a more idiomatic approach -- possibly ignoring the issue of Cell structs entirely in favor of something more direct like a big HashMap of tuples.

Venantius
  • 2,471
  • 2
  • 28
  • 36
  • Your title asks about having the same type, but that's not a problem; [your code as presented will *compile*](http://play.integer32.com/?gist=9be3072407fb654fc0035404120150a5&version=stable). Perhaps you want to revisit the title after reviewing what question you are asking in the body? – Shepmaster Jan 05 '17 at 16:47
  • Also, your "question" feels like it is missing a **question**. Reading through the body of the post, I'm just nodding and saying "yes, that sounds right" to most everything. What do you want from the Stack Overflow community? – Shepmaster Jan 05 '17 at 16:49
  • Edited the original question to clarify that I need help implementing `link` – Venantius Jan 05 '17 at 16:57

2 Answers2

2

How can I create a struct with a HashMap where the keys the same type as the container

the original code is in Ruby

I'm not that well acquainted with Ruby, but I know a lot of similar languages so I'll make some educated guesses.

Ruby is a thoroughly garbage collected language. What it means is that you never get to store the keys of some type in a Ruby HashMap. You might think that you're storing a key of type Cell, but what really happens is that you're storing a garbage-collected pointer to an instance of that Cell.

And a pointer is just an index into some memory region.

Now, Rust is a different kind of language. In Rust you can store the actual Cell right there in the underlying keys array of the HashMap. In other words, Rust gives you more control, and that often translates to better memory usage and speed.

But with control comes responsibility (unless you want to break things and break them fast). In Rust you are supposed to explicitly specify who owns the instance. And when you create a struct that points to itself (a cyclic graph), the ownership gets muddled.

So one way to implement a cyclic structure is to move the ownership concern out of the equation!

In Ruby the garbage collector solves this. Memory management is the concern of the garbage collector and you implement cyclic structures without ever dealing with it.

In Rust you can do the same by using an (experimental) garbage collector, maybe rust-gc.

Or you can get your hands dirty and actually manage the memory.

This isn't as hard as it sounds. All you need is some non-cyclic structure own the Cell. Vec<Cell> would suffice. After you have stored the Cell in a vector, you don't have to worry about the ownership anymore. Because the Cell is now owned by that vector, plain and simple. You can make cyclic structurs, referencing the Cell by its vector index, just like the Ruby does with its pointers!

There are all kinds of variations on this. When you manage the ownership, you are the boss, the choice is yours. You can optimize some algorithms with memory pools, also known as arenas. You can use reference counting (there's no need for a vector then, but you'd want to carefully clear the references when removing the cells from your structure - in order to break the reference counting cycles). You can use a kind of deque to allocate the memory in chunks but without reallocations that vector does, then store direct pointers into that deque. Some of the options are mentioned in this reddit discussion.

But the principle is simple. Make sure that something (Vec, Rc, Gc) covers the ownership concern. And when that's covered, you can program just like in Ruby, because the ownership is no longer an issue.

Community
  • 1
  • 1
ArtemGr
  • 11,684
  • 3
  • 52
  • 85
  • *You can use a kind of deque to allocate the memory in chunks but without reallocations that vector does* — that's how many of the arenas are implemented. – Shepmaster Jan 05 '17 at 22:42
  • @Shepmaster That's why deque is such a good example. It's a general purpose structure, yet unlike a vector it's usually designed with the pointer preservation in mind. In C++ this property is carved in stone: "*insertion and deletion at either end of a deque never invalidates pointers or references to the rest of the elements*" - http://en.cppreference.com/w/cpp/container/deque. – ArtemGr Jan 05 '17 at 22:57
  • Yeah, this has been the avenue that has seemed the most promising (I was looking specifically at `Rc` and seeing a lot of chatter around `Vec`), although I have trouble finding clear examples on the usage of `Rc` whereas quite a few people seem to go the route of `Vec`. So, to be clear, the idea here is that since each `Cell` is stored in a `Vec`, they can also include a `HashMap` that points to other `Cells` stored in that same `Vec`? – Venantius Jan 05 '17 at 23:05
  • In summary: Have Maze own the Cells and represent the links as lists of raw pointers or (row,column) pairs. – Matt Timmermans Jan 05 '17 at 23:07
1

possibly ignoring the issue of Cell structs entirely in favor of something more direct like a big HashMap of tuples.

This is the approach I would normally go for in a "two dimensional grid with holes" structure (Game of Life style). However, in this case, you find that you want to be able to state which cells connect to other cells, so then you need to keep around a bunch of booleans (or a bitmask or other representation). And then you will want to perform traversal from one cell to another, and then realize that you don't want to cycle back on yourself.

Quickly, you realize that you are describing a graph. As ArtemGr says:

when you create a struct that points to itself (a cyclic graph), the ownership gets muddled.

One of the most popular Rust graph libraries, petgraph, makes use of the described "vector of nodes". From the documentation:

Pros and Cons of Indices

  • The fact that the node and edge indices in the graph each are numbered in compact intervals (from 0 to n - 1 for n nodes) simplifies some graph algorithms.

  • You can select graph index integer type after the size of the graph. A smaller size may have better performance.

  • Using indices allows mutation while traversing the graph, see Dfs, and .neighbors(a).detach().

  • You can create several graphs using the equal node indices but with differing weights or differing edges.

  • The Graph is a regular rust collection and is Send and Sync (as long as associated data N and E are).

  • Some indices shift during node or edge removal, so that is a drawback of removing elements. Indices don't allow as much compile time checking as references.

I'd still use a library like this because I believe that the author is more likely to have spent time optimizing it than I have.

Community
  • 1
  • 1
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366