5

I have a Regex with an unknown number of named groups with unknown names. I want to match a string to that regex, and get a HashMap<&str, &str> with the name of the groups as key and the captured strings as value.

How can I do this? Will I have to use regex.captures(str).iter() and then somehow map and filter and collect into a map? Or is there some shortcut?

Anders
  • 8,307
  • 9
  • 56
  • 88

2 Answers2

8

It is tricky because the regex can have multiple matches, and each capture can be matched multiple times in a single global match.

Maybe something like this (playground):

fn main() {
    let re = Regex::new(r"(?P<y>\d{4})-(?P<m>\d{2})-(?P<d>\d{2})").unwrap();
    let text = "2012-03-14";
    let caps = re.captures(text).unwrap();
    let dict: HashMap<&str, &str> = re
        .capture_names()
        .flatten()
        .filter_map(|n| Some((n, caps.name(n)?.as_str())))
        .collect();
    println!("{:#?}", dict);
}

That outputs:

{
    "y": "2012",
    "d": "14",
    "m": "03"
}

The code is simple once you realize that the capture names are not available from the Match itself, but from the parent Regex. You have to do the following:

  1. Call capture_names(), that will be an iterable of Option<&str>.
  2. flatten() the iterable, that will remove the None and unwrap the &str values.
  3. filter_map() the capture names into a list of tuples (name, value) of type (&str, &str). The filter is needed to remove captures that are not present (thanks to @Anders).
  4. collect()! This just works because HashMap<K, V> implements the trait FromIterator<(K, V)>, so an iterator of (&str, &str) collects into a HasMap<&str, &str>.
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
rodrigo
  • 94,151
  • 12
  • 143
  • 190
  • 2
    This panics if a a named group is missing. It can be fixed by using `filter_map(|n| Some((n, captures.name(n)?.as_str())))` instead of the `map`. – Anders Jan 18 '19 at 19:44
  • 2
    @Anders: Oh, you are right. I'll fix it as you suggested. Although a more idiomatic solution would be to create a `HasMap<&str, Option<&str>>` by using instead `map(|n| (n, caps.name(n).map(|m| m.as_str())))` – rodrigo Jan 18 '19 at 19:52
  • Good suggestion with the option. – Anders Jan 18 '19 at 19:54
3

If you have multiple captures, you can collect them into a list like this:

let all: Vec<HashMap<&str, &str>> = re
    .captures_iter("2012-01-12 , 2013-07-11 , 2014-09-14")
    .map(|caps| {
        re.capture_names()
            .map(|o| o.and_then(|n| Some((n, caps.name(n)?.as_str()))))
            .flatten()
            .collect()
    })
    .collect();

println!("{:#?}", all);
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Ömer Erden
  • 7,680
  • 5
  • 36
  • 45