2

I have a string: "abcd" and I want to:

  • Iterate its prefixes from shortest to longest:

    "", "a", "ab", "abc", "abcd"

  • Iterate its prefixes from longest to shortest:

    "abcd", "abc", "ab", "a", ""

  • Iterate its suffixes from shortest to longest:

    "", "d", "cd", "bcd", "abcd"

  • Iterate its suffixes from longest to shortest:

    "abcd", "bcd", "cd", "d", ""

  • 5
    You don't say what you need these for, so it's impossible to know whether you want byte prefixes, codepoint prefixes or grapheme prefixes. Your only examples use ASCII, where all three of these things are equivalent. – BurntSushi5 Aug 19 '21 at 10:30

1 Answers1

4

Strings are more complicated then one might expect

  • To match human intuition you usually want to treat a string as a sequence of 0 or more grapheme clusters.
  • A grapheme cluster is a sequence of 1 or more Unicode code points
  • In the utf8 encoding a code point is represented as a sequence of 1, 2, 3 or 4 bytes
  • Both String and str in rust use utf8 to represent strings and indexes are byte offsets
  • Slicing a part of a code point makes no sense and produces garbage data. Rust chooses to panic instead:
#[cfg(test)]
mod tests {
    #[test]
    #[should_panic(expected = "byte index 2 is not a char boundary; it is inside '\\u{306}' (bytes 1..3) of `y̆`")]
    fn bad_index() {
        let y = "y̆";
        &y[2..];
    }
}

A solution

Warning: this code works at the code point level and is grapheme cluster oblivious.

From shortest to longest:

use core::iter;

pub fn prefixes(s: &str) -> impl Iterator<Item = &str> + DoubleEndedIterator {
    s.char_indices()
        .map(move |(pos, _)| &s[..pos])
        .chain(iter::once(s))
}

pub fn suffixes(s: &str) -> impl Iterator<Item = &str> + DoubleEndedIterator {
    s.char_indices()
        .map(move |(pos, _)| &s[pos..])
        .chain(iter::once(""))
        .rev()
}

In reverse:

prefixes(s).rev()
suffixes(s).rev()

test

See also: How to iterate prefixes or suffixes of vec or slice in rust?