Strings are more complicated then one might expect
- To match human intuition you usually want to treat a string as a sequence of 0 or more grapheme clusters.
- A grapheme cluster is a sequence of 1 or more Unicode code points
- In the utf8 encoding a code point is represented as a sequence of 1, 2, 3 or 4 bytes
- Both String and str in rust use utf8 to represent strings and indexes are byte offsets
- Slicing a part of a code point makes no sense and produces garbage data. Rust chooses to panic instead:
#[cfg(test)]
mod tests {
#[test]
#[should_panic(expected = "byte index 2 is not a char boundary; it is inside '\\u{306}' (bytes 1..3) of `y̆`")]
fn bad_index() {
let y = "y̆";
&y[2..];
}
}
A solution
Warning: this code works at the code point level and is grapheme cluster oblivious.
From shortest to longest:
use core::iter;
pub fn prefixes(s: &str) -> impl Iterator<Item = &str> + DoubleEndedIterator {
s.char_indices()
.map(move |(pos, _)| &s[..pos])
.chain(iter::once(s))
}
pub fn suffixes(s: &str) -> impl Iterator<Item = &str> + DoubleEndedIterator {
s.char_indices()
.map(move |(pos, _)| &s[pos..])
.chain(iter::once(""))
.rev()
}
In reverse:
prefixes(s).rev()
suffixes(s).rev()
test
See also: How to iterate prefixes or suffixes of vec or slice in rust?