29

I've written the following code (+ demo) to remove entries from a HashMap based on value. It works, but I feel like I'm struggling against the borrow-checker with the use of:

  • clone() to avoid two references to the same set of keys
  • an extra let tmp = binding to increase the lifetime of my temp value

use std::collections::HashMap;

fn strip_empties(x: &mut HashMap<String, i8>) {
    let tmp = x.clone();
    let empties = tmp
         .iter()
         .filter(|&(_, &v)| v == 0)
         .map(|(k, _)| k);

    for k in empties { x.remove(k); }
}

fn main() {
    let mut x: HashMap<String, i8> = HashMap::new();
    x.insert("a".to_string(), 1);
    x.insert("b".to_string(), 0);
    strip_empties(&mut x);

    println!("Now down to {:?}" , x);
}

Is there a cleaner, more idiomatic way to accomplish this?

Bosh
  • 8,138
  • 11
  • 51
  • 77

3 Answers3

52

The other answers are outdated. As of Rust 1.27, you can use HashMap::retain to keep only the elements you are interested in. You specify the elements to keep using a closure.

x.retain(|_, v| *v != 0);
dimo414
  • 47,227
  • 18
  • 148
  • 244
Johannes
  • 987
  • 10
  • 15
  • 5
    An example showing how to solve the problem in OPs code would go a long way to making this a good answer. – Shepmaster Sep 30 '18 at 21:23
  • meta question: is there a way to somehow push a more updated answer to the top? e.g., perhaps changing this to the accepted answer? – mallwright Sep 29 '20 at 11:08
16

Why the mutation of the HashMap? Just create a new one (all hail immutability):

fn strip_empties(x: HashMap<String, i8>) -> HashMap<String, i8> {
    return x.into_iter()
        .filter(|&(_, v)| v != 0)
        .collect();
}

Playpen


Edit: Why this is feasible.

Of course you have to consider your use case. The best approach may vary if you have a large HashMap or filter many/few elements. Lets compare the implementations.

use std::collections::HashMap;

fn strip_empties_mutable(x: &mut HashMap<String, i8>) {
    let empties: Vec<_> = x
        .iter()
        .filter(|&(_, &v)| v == 0)
        .map(|(k, _)| k.clone())
        .collect();
    for empty in empties { x.remove(&empty); }
}

fn strip_empties_immutable(x: HashMap<String, i8>) -> HashMap<String, i8> {
    return x.into_iter()
        .filter(|&(_, v)| v != 0)
        .collect();
}

fn build_hashmap() -> HashMap<String, i8> {
    let mut map = HashMap::new();
    for chr in "abcdefghijklmnopqrstuvmxyz".chars() {
        map.insert(chr.to_string(), chr as i8 % 2);
    }
    return map;
}

#[cfg(mutable)]
fn main() {
    let mut map = build_hashmap();
    strip_empties_mutable(&mut map);
    println!("Now down to {:?}" , map);
}

#[cfg(immutable)]
fn main() {
    let mut map = build_hashmap();
    map = strip_empties_immutable(map);
    println!("Now down to {:?}" , map);
}

Save this as hashmap.rs and run:

rustc --cfg mutable -O -o mutable hashmap.rs
rustc --cfg immutable -O -o immutable hashmap.rs

If we look at the different runtimes (e.g. using perf stat -r 1000 ./XXX) we don't see significant differences.

But lets look at the number of allocations:

valgrind --tool=callgrind --callgrind-out-file=callgrind_mutable ./mutable
valgrind --tool=callgrind --callgrind-out-file=callgrind_immutable ./immutable
callgrind_annotate callgrind_mutable | grep 'je_.*alloc'
callgrind_annotate callgrind_immutable | grep 'je_.*alloc'
  • callgrind_mutable:

    7,000  ???:je_arena_malloc_small [$HOME/hashmap/mutable]
    6,457  ???:je_arena_dalloc_bin_locked [$HOME/hashmap/mutable]
    4,800  ???:je_mallocx [$HOME/hashmap/mutable]
    3,903  ???:je_sdallocx [$HOME/hashmap/mutable]
    2,520  ???:je_arena_dalloc_small [$HOME/hashmap/mutable]
      502  ???:je_rallocx [$HOME/hashmap/mutable]
      304  ???:je_arena_ralloc [$HOME/hashmap/mutable]
    
  • callgrind_immutable:

    5,114  ???:je_arena_malloc_small [$HOME/hashmap/immutable]
    4,725  ???:je_arena_dalloc_bin_locked [$HOME/hashmap/immutable]
    3,669  ???:je_mallocx [$HOME/hashmap/immutable]
    2,980  ???:je_sdallocx [$HOME/hashmap/immutable]
    1,845  ???:je_arena_dalloc_small [$HOME/hashmap/immutable]
      158  ???:je_rallocx [$HOME/hashmap/immutable]
    

And this is not very suprising as the clone() calls in the mutable approach allocates memory aswell. Of course the mutable version might yield a HashMap with a larger capacity.

  • While this looks really nice on paper, it'll involve a lot of deallocations and allocations for the HashMap entries. That's why a mutation might be preferable. – sellibitze Mar 07 '15 at 12:05
  • I think it will only involve one allocation and deallocation (for the whole backing table). – huon Mar 07 '15 at 22:22
  • Can you give me a sense of why you call this "immutable", when it consumes (destroys) the source hashmap? I understand the approach and like it -- but I'm trying to figure out why "immutable" is the right word for it. – Bosh Mar 08 '15 at 00:32
  • @Bosh In this case the source HashMap is destroyed, yes. But it could be retained in it's original form if another variable was used. So it's immutable as the original HashMap is not mutated. –  Mar 08 '15 at 08:54
  • @lummax: My understanding is that even if you used another variable, as in `let mut stripped_map = strip_empties_immutable(map);`, the original `map` variable would no longer be usable, because of the behavior of `into_iter` (which consumes the thing). Is that right? – Bosh Mar 08 '15 at 22:23
  • @Bosh: Of course. The `into_iter()` is consuming (`strip_empties()` is consuming). But that could easily be changed into a reference. This is about if it's ok to construct a new Hashmap. –  Mar 09 '15 at 09:03
  • What about usage of `drain` for inplace filter? – fghj Aug 18 '17 at 16:50
  • There's now a tracking issue to add drain_filter to HashMap. It's not merged at time of writing but might be soon: https://github.com/rust-lang/rust/issues/59618 – Chris Jul 16 '20 at 09:50
5

There is no way to delete values from hashmap during iteration (neither via remove, neither via Entry api) because of borrowing restrictions, so your idea (collecting keys to remove) is pretty close to the right solution.

You just don't need to clone the whole hash table, it is sufficient to collect only key copies:

fn strip_empties(x: &mut HashMap<String, i8>) {
    let empties: Vec<_> = x
         .iter()
         .filter(|&(_, &v)| v == 0)
         .map(|(k, _)| k.clone())
         .collect();
    for empty in empties { x.remove(&empty); }
}
swizard
  • 2,551
  • 1
  • 18
  • 26
  • 2
    This raises the interesting question why `HashMap` does not have an iterator that returns `Entry` (or `(Key, Entry)` tuples). I can't see any reason why this should not be possible. Does anyone know if this is just a case of "Well, no-one has bothered to implement it yet"? – fjh Mar 07 '15 at 14:52
  • 1
    `HashMap`'s `iter` function (as used above) *does* create an iterator over `Key, Entry` tuples. I may be misunderstanding your question... – Bosh Mar 08 '15 at 00:34
  • 1
    Different "Entry". fjh is talking about std::collections::hash_map::Entry and I have the same question. I think Bosh is just using the word Entry generically to mean the value in the map. – Clayton Rabenda Apr 30 '22 at 04:42