-1

I am finding extremely difficult to deal with vectors of HashMaps in rust. I want to use vectors of HashMaps as an implementation of a sparse matrix container according to constraints that I, hopefully, will be able to explain below.

Most importantly I want to minimize access and modification time, including amortized allocations. Then I want to reduce memory usage as much as possible without increasing the access time, or at least not significantly.

About execution time: I will definitely do a lot of lookups so I need the best access time possible which would be given by a vector as opposed to HashMap<(INDEX, KEY), Value> as mentioned in the comments. Correct me if I am wrong: hashing and collisions handling would definitely impact performance. My key is a complex struct. Although I am considering to simplify it in an enum for the key so maybe hashing will be cheaper in this case, but I am not sure if that will be possible.

About memory: I need to avoid unnecessary memory overhead with empty HashMap which requires 48 bytes in memory. My vector has size in the order of 100 million thus it can easily consume tens of GB. To get out of this wasteful situation, I opted to use Option<Box>. As I understand, I require nullability and cannot use Box directly in this case because this type doesn't allow for nullability, hence it will force me to use empty HashMap and reserve 48 bytes somewhere in memory plus an additional 8 bytes per element because of the Box itself. With Option<Box>, the None variant consumes only 8 bytes, resulting in up to 6 times less memory if all the elements were empty HashMaps.

I am very open to alternative approaches to a sparse matrix implementation that satisfies my needs, but in this question I want to focus about rust patterns for memory efficient representations of vector of hash maps that also avoid dealing with what seems to me unnecessarily verbose chains of Option and pointer types (Box/Rc and friends). Is there any better approach for handling vectors of nullable HashMaps in rust?

Redirectk
  • 160
  • 9
  • I understand nothing of your complain. It would be shorter to explain what you are trying to do instead of talking about the solution you think you need. – Stargateur Apr 15 '23 at 15:55
  • @Stargateur added more details – Redirectk Apr 15 '23 at 17:53
  • I read through your whole question, and don't understand where the problem lies. Did you find it hard to work with a `Vec – user4815162342 Apr 15 '23 at 19:51
  • @user4815162342 yes indeed, I find it verbose and boilerplate-y and I wonder if there is a better way (i.e. avoid the chain altogether) or if I just have to get used to it :) – Redirectk Apr 15 '23 at 20:11
  • @Redirectk I'm probably repeating myself at this point, but to make it 100% clear, I think you should get used to it, and I don't even see where the boilerplate is. The alternative is to switch to the `(index, key)` single-hashmap approach which seems like a better choice for sparse lookups, and will result in a smaller number of allocations. In either case, if you have a specific problem with something being unreasonably hard or verbose, please do post a more concrete question. Otherwise, I think this is a great way to learn Rust (I assume you're new to the language) - have fun! – user4815162342 Apr 15 '23 at 20:15
  • I think @user4815162342 and @Ahmed Masud are right that using a single `HashMap` is the right approach unless you have some really arcane requirements. If you really want to use `Vec – willtunnels Apr 15 '23 at 23:11

1 Answers1

1

According to the Rust docs, creating an empty HashMap does not incur a heap allocation. In fact, Rust containers have this behavior in general. There should be no need to wrap your HashMaps in an Option.

The fact that nullability must be made explicit in Rust is to ensure that API contracts are clear to both the library writer and the user. As with all things, "there is no free lunch" and making this explicit can require a few unwraps here and there.

Finally, note that while HashMap is not Copy it is Clone if its keys, values, and state are Clone (the default state is, in fact, clone). As stated in the Rust docs, Clone "[d]iffers from Copy in that Copy is implicit and an inexpensive bit-wise copy, while Clone is always explicit and may or may not be expensive."

willtunnels
  • 106
  • 4
  • Empty HashMap still requires 48 bytes according to size_of, while an option only requires 8. Having millions of empty hash maps in the vector easily makes my program consume tens of GB – Redirectk Apr 15 '23 at 15:19
  • 4
    @Redirectk Just to avoid the confusion of further readers, you probably very loosely use the term "option" to refer to an `Option>>` or a similar indirection because [an `Option>` requires the exact same amount of memory as a `HashMap`](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=e934166c141774a539d0757187c3afa4) – cafce25 Apr 15 '23 at 15:50
  • @cafce25 you right but you forget the option in type XD – Stargateur Apr 15 '23 at 15:57
  • @cafce25 is exactly right. If you wanted to avoid the 48 bytes, you would use a `Box`. This is orthogonal to whether you use an `Option`. I understand the confusion though because most languages conflate nullability with allocation strategy (e.g. Java). – willtunnels Apr 15 '23 at 15:58
  • 2
    @Redirectk if you have a need for millions of HashMaps in a vector, perhaps you need to rewrite a custom container. For example have you considered a hashmap which has a the form `HashMap<(INDEX, KEY), Value>` to replace your entire vector of hashmaps, where `INDEX` is the position in this large vector and KEY is the local hashmap key? When you start running into structures that large you have to build your own data structure to optimize the situation because you know more about it than the generic implementations – Ahmed Masud Apr 15 '23 at 15:58
  • @cafce25 That's a fair point. I will clarify my question that I mean Option> or similar chains of pointer types/option. – Redirectk Apr 15 '23 at 17:49
  • @willtunnels It seems to me that creating a Box> still requires an empty HashMap in memory? Plus additional 8 bytes for the box itself – Redirectk Apr 15 '23 at 17:49
  • @AhmedMasud That is a valid alternative. I edited my question to clarify that in my case access time is really important and the map lookup will probably be too slow. I haven't run any benchmark in this case so please correct me if I am wrong! – Redirectk Apr 15 '23 at 17:50
  • @Redirectk only a benchmark can give you a definitive answer, but remember that `HashMap` lookup is O(1), so looking in a single unified `HashMap` should be faster than looking up a `HashMap` in the `Vec` followed by looking up the key in this smaller `HashMap`. Of course this assumes that you don't need to iterate all elements of `HashMap` number `n`. – Jmb Apr 15 '23 at 18:34
  • @Redirectk Ah yes, that's obviously true. – willtunnels Apr 15 '23 at 23:03