46

I looked at the Rust docs for String but I can't find a way to extract a substring.

Is there a method like JavaScript's substr in Rust? If not, how would you implement it?

str.substr(start[, length])

The closest is probably slice_unchecked but it uses byte offsets instead of character indexes and is marked unsafe.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
laktak
  • 57,064
  • 17
  • 134
  • 164

9 Answers9

71

For characters, you can use s.chars().skip(pos).take(len):

fn main() {
    let s = "Hello, world!";
    let ss: String = s.chars().skip(7).take(5).collect();
    println!("{}", ss);
}

Beware of the definition of Unicode characters though.

For bytes, you can use the slice syntax:

fn main() {
    let s = b"Hello, world!";
    let ss = &s[7..12];
    println!("{:?}", ss);
}
WiSaGaN
  • 46,887
  • 10
  • 54
  • 88
  • Didn't see `chars()`, thanks! Would it also be possible to map the char index to a byte offset and create a slice from that? – laktak May 11 '16 at 09:31
  • 3
    @laktak, you can use `str::char_indices` for that. https://doc.rust-lang.org/std/primitive.str.html#method.char_indices – WiSaGaN May 11 '16 at 09:41
  • In the second example I think you meant `b"Hello, world!"` ? – Jonas Berlin Jun 14 '20 at 19:35
16

You can use the as_str method on the Chars iterator to get back a &str slice after you have stepped on the iterator. So to skip the first start chars, you can call

let s = "Some text to slice into";
let mut iter = s.chars();
iter.by_ref().nth(start); // eat up start values
let slice = iter.as_str(); // get back a slice of the rest of the iterator

Now if you also want to limit the length, you first need to figure out the byte-position of the length character:

let end_pos = slice.char_indices().nth(length).map(|(n, _)| n).unwrap_or(0);
let substr = &slice[..end_pos];

This might feel a little roundabout, but Rust is not hiding anything from you that might take up CPU cycles. That said, I wonder why there's no crate yet that offers a substr method.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
oli_obk
  • 28,729
  • 6
  • 82
  • 98
  • "not hiding anything from you that might take up CPU cycles" - can you explain why substr might be more expensive than any of the trim functions it has? – laktak May 11 '16 at 14:25
  • 3
    Well... the trim functions are expected to get rid of all whitespace they encounter. This is a O(n) operation by definition. But using a `substr` method the user might assume that it is O(1), because they are entering indices. – oli_obk May 11 '16 at 14:32
  • `but Rust is not hiding anything from you that might take up CPU cycles.` This is not true since `Vec::insert` exists – Dave Halter Sep 22 '21 at 19:38
6

This code performs both substring-ing and string-slicing, without panicking nor allocating:

use std::ops::{Bound, RangeBounds};

trait StringUtils {
    fn substring(&self, start: usize, len: usize) -> &str;
    fn slice(&self, range: impl RangeBounds<usize>) -> &str;
}

impl StringUtils for str {
    fn substring(&self, start: usize, len: usize) -> &str {
        let mut char_pos = 0;
        let mut byte_start = 0;
        let mut it = self.chars();
        loop {
            if char_pos == start { break; }
            if let Some(c) = it.next() {
                char_pos += 1;
                byte_start += c.len_utf8();
            }
            else { break; }
        }
        char_pos = 0;
        let mut byte_end = byte_start;
        loop {
            if char_pos == len { break; }
            if let Some(c) = it.next() {
                char_pos += 1;
                byte_end += c.len_utf8();
            }
            else { break; }
        }
        &self[byte_start..byte_end]
    }
    fn slice(&self, range: impl RangeBounds<usize>) -> &str {
        let start = match range.start_bound() {
            Bound::Included(bound) | Bound::Excluded(bound) => *bound,
            Bound::Unbounded => 0,
        };
        let len = match range.end_bound() {
            Bound::Included(bound) => *bound + 1,
            Bound::Excluded(bound) => *bound,
            Bound::Unbounded => self.len(),
        } - start;
        self.substring(start, len)
    }
}

fn main() {
    let s = "abcdèfghij";
    // All three statements should print:
    // "abcdè, abcdèfghij, dèfgh, dèfghij."
    println!("{}, {}, {}, {}.",
        s.substring(0, 5),
        s.substring(0, 50),
        s.substring(3, 5),
        s.substring(3, 50));
    println!("{}, {}, {}, {}.",
        s.slice(..5),
        s.slice(..50),
        s.slice(3..8),
        s.slice(3..));
    println!("{}, {}, {}, {}.",
        s.slice(..=4),
        s.slice(..=49),
        s.slice(3..=7),
        s.slice(3..));
}
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
carlo.milanesi
  • 116
  • 2
  • 3
5

For my_string.substring(start, len)-like syntax, you can write a custom trait:

trait StringUtils {
    fn substring(&self, start: usize, len: usize) -> Self;
}

impl StringUtils for String {
    fn substring(&self, start: usize, len: usize) -> Self {
        self.chars().skip(start).take(len).collect()
    }
}

// Usage:
fn main() {
    let phrase: String = "this is a string".to_string();
    println!("{}", phrase.substring(5, 8)); // prints "is a str"
}
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
user1656730
  • 184
  • 2
  • 6
1

The solution given by oli_obk does not handle last index of string slice. It can be fixed with .chain(once(s.len())).

Here function substr implements a substring slice with error handling. If invalid index is passed to function, then a valid part of string slice is returned with Err-variant. All corner cases should be handled correctly.

fn substr(s: &str, begin: usize, length: Option<usize>) -> Result<&str, &str> {
    use std::iter::once;
    let mut itr = s.char_indices().map(|(n, _)| n).chain(once(s.len()));
    let beg = itr.nth(begin);
    if beg.is_none() {
        return Err("");
    } else if length == Some(0) {
        return Ok("");
    }
    let end = length.map_or(Some(s.len()), |l| itr.nth(l-1));
    if let Some(end) = end {
        return Ok(&s[beg.unwrap()..end]);
    } else {
        return Err(&s[beg.unwrap()..s.len()]);
    }
}
let s = "abc";
assert_eq!(Ok("bc"), substr(s, 1, Some(2)));
assert_eq!(Ok("c"), substr(s, 2, Some(2)));
assert_eq!(Ok("c"), substr(s, 2, None));
assert_eq!(Err("c"), substr(s, 2, Some(99)));
assert_eq!(Ok(""), substr(s, 2, Some(0)));
assert_eq!(Err(""), substr(s, 5, Some(4)));

Note that this does not handle unicode grapheme clusters. For example, "y̆es" contains 4 unicode chars but 3 grapheme clusters. Crate unicode-segmentation solves this problem. Unicode grapheme clusters are handled correctly if part

let mut itr = s.char_indices()...

is replaced with

use unicode_segmentation::UnicodeSegmentation;
let mut itr = s.grapheme_indices(true)...

Then also following works

assert_eq!(Ok("y̆"), substr("y̆es", 0, Some(1)));
tolvanea
  • 11
  • 2
1

Knowing about the various syntaxes of the slice type might be beneficial for some of the readers.

  • Reference to a part of a string
    &s[6..11]
  • If you start at index 0, you can omit the value
    &s[0..1] ^= &s[..1]
  • Equivalent if your substring contains the last byte of the string
    &s[3..s.len()] ^= &s[3..]
  • This also applies when the slice encompasses the entire string
    &s[..]
  • You can also use the range inclusive operator to include the last value
    &s[..=1]

Link to docs: https://doc.rust-lang.org/book/ch04-03-slices.html

tenxsoydev
  • 370
  • 2
  • 10
-1

I'm not very experienced in Rust but I gave it a try. If someone could correct my answer please don't hesitate.

fn substring(string:String, start:u32, end:u32) -> String {
    let mut substr = String::new();
    let mut i = start;
    while i < end + 1 {
        substr.push_str(&*(string.chars().nth(i as usize).unwrap().to_string()));
        i += 1;
    }
    return substr;
}

Here is a playground

Anders Evensen
  • 579
  • 5
  • 15
-1

I couldn't find the exact substr implementation that I'm familiar with from other programming languages like: JavaScript, Dart, and etc.

Here is possible implementation of method substr to &str and String

Let's define a trait for making able to implement functions to default types, (like extensions in Dart).

trait Substr {
    fn substr(&self, start: usize, end: usize) -> String; 
}

Then implement this trait for &str

impl<'a> Substr for &'a str {
  fn substr(&self, start: usize, end: usize) -> String {
      if start > end || start == end {
          return String::new();
      }
      
      self.chars().skip(start).take(end - start).collect()
  }
}

Try:

fn main() {
    let string = "Hello, world!";
    let substring = string.substr(0, 4);
    println!("{}", substring); // Hell
}
theiskaa
  • 1,202
  • 5
  • 25
-2

You can also use .to_string()[ <range> ].

This example takes an immutable slice of the original string, then mutates that string to demonstrate the original slice is preserved.

let mut s: String = "Hello, world!".to_string();

let substring: &str = &s.to_string()[..6];

s.replace_range(..6, "Goodbye,");

println!("{}   {} universe!", s, substring);

//    Goodbye, world!   Hello, universe!
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Ian MacDonald
  • 13,472
  • 2
  • 30
  • 51
  • 1
    This is not like JavaScript: `"ウィキペディアへようこそ".substr(1, 3)` vs `&"ウィキペディアへようこそ"[1..3]`. One "works", the other doesn't. – Shepmaster Mar 04 '20 at 16:56
  • 1
    @Shepmaster Correct. This does not work for all characters. The `skip.take.collect` method is ideal. – Ian MacDonald Mar 04 '20 at 17:21