1

I'm using Rust 0.13, and am rather new to Rust. I have a struct that would like to own a string input, but I have code that would like to work with slices of that string, work.

pub struct Lexer<'a> {
    input : Option<String>,
    work : &'a str,
    ...
}

My goal is to pass a string to the struct, have it create its own copy, then to create an initial slice pointing to that string. Ideally, I can now use this slice to manipulate it, as the memory backing the slice won't ever change.

pub fn input(&mut self, input : String) {
    self.input = Some(input.clone());
    self.work = self.input.unwrap().as_slice();
}

impl<'lex> Iterator<Token> for Lexer<'lex> {
    fn next(&mut self) -> Option<Token> {
        // ...Do work...
        match regex!("\\S").find(self.work) {
            Some((0, end)) => {
                // Cheap to move the view around
                self.work = self.work.slice_from(end);
            },
            _ => ()
        }
        // ... Do more work ...
    }
}

However, this doesn't work because the lifetime is too short:

error: borrowed value does not live long enough
    self.work = self.input.unwrap().as_slice();
                ^~~~~~~~~~~~~~~~~~~

I'm interpreting this to mean that self.input could change, invalidating self.work's view. Is this a reasonable interpretation?

Is there a way to specify that these fields are tied to each other somehow? I think if I could specify that Lexer.input is final this would work, but it doesn't look like Rust has a way to do this.

Edit: sample calling code

let mut lexer = lex::Lexer::new();

lexer.add("[0-9]+", Token::NUM);
lexer.add("\\+", Token::PLUS);

for line in io::stdin().lock().lines() {
    match line {
        Ok(input) => {
            lexer.input(input.as_slice());
            lexer.lex();
        },
        Err(e) => ()
    }
}
Matt Bryant
  • 4,841
  • 4
  • 31
  • 46
  • possible duplicate of [How can I provide a reference to a struct that is a sibling?](http://stackoverflow.com/questions/26349778/how-can-i-provide-a-reference-to-a-struct-that-is-a-sibling) – Shepmaster Dec 29 '14 at 18:59
  • "if I could specify that Lexer.input is final" - if you want a string that can never change, than that sounds like a `&str` ^_^. – Shepmaster Dec 29 '14 at 19:03
  • I agree with that, but I'm not really sure how to handle the `String` backing `input : &str`. I found that doing it that way ended up tying my `Lexer` to the same lifetime as whatever gave it `input`, so I thought about how I'd do it in C and switched to this way. – Matt Bryant Dec 29 '14 at 19:09
  • "that way ended up tying my Lexer to the same lifetime" - yup, you *want* your Lexer to be tied to the same lifetime as the `&str` - that's the only way Rust can ensure that you don't lex a string that no longer exists and that it doesn't change underneath you. – Shepmaster Dec 29 '14 at 20:00
  • That's why I'm trying to move the input into the `Lexer`, to let me use the invariants Rust is trying to provide. For example, I might like to call `Lexer.input()` on multiple different inputs from `stdin`, but if I do this with `&str` I somehow need the strings from `stdin` to have the same scope as the lexer struct, which doesn't appear to be feasible or at all what I want in terms of memory usage. – Matt Bryant Dec 29 '14 at 20:15
  • Could you expand on what you mean by "not feasible" or why you think the memory usage will be negative? We might be able to get a unique question out of this yet! – Shepmaster Dec 29 '14 at 20:27
  • I've added some example calling code to the end of my question. As far as I can tell, I could make this calling code work if I could have `input` be scoped outside of the loop, but I believe this would also require me to keep each value for `input`, as otherwise I'd against have garbage slices. When I say "not feasible", I mean I don't see how I'd even implement this scoping change. – Matt Bryant Dec 29 '14 at 20:31

1 Answers1

2

I think your issue can be solved by adding one more layer. You can have one layer that collects the rules of your lexer, and then you create a new struct that actually does the lexing. This is parallel to how the iterators in Rust are implemented themselves!

struct MetaLexer<'a> {
    rules: Vec<(&'a str, u32)>,
}

impl<'a> MetaLexer<'a> {
    fn new() -> MetaLexer<'a> { MetaLexer { rules: Vec::new() } }

    fn add_rule(&mut self, name: &'a str, val: u32) {
        self.rules.push((name, val));
    }

    fn lex<'r, 's>(&'r self, s: &'s str) -> Lexer<'a, 's, 'r> {
        Lexer {
            rules: &self.rules,
            work: s,
        }
    }
}

struct Lexer<'a : 'r, 's, 'r> {
    rules: &'r [(&'a str, u32)],
    work: &'s str,
}

impl<'a, 's, 'r> Iterator for Lexer<'a, 's, 'r> {
    type Item = u32;

    fn next(&mut self) -> Option<u32> {
        for &(name, val) in self.rules.iter() {
            if self.work.starts_with(name) {
                self.work = &self.work[name.len()..];
                return Some(val);
            }
        }

        None
    }
}

fn main() {
    let mut ml = MetaLexer::new();
    ml.add_rule("hello", 10);
    ml.add_rule("world", 3);

    for input in ["hello", "world", "helloworld"].iter() {
        // So that we have an allocated string,
        // like io::stdin().lock().lines() might give us
        let input = input.to_string(); 

        println!("Input: '{}'", input);
        for token in ml.lex(&input) {
            println!("Token was: {}", token);
        }
    }
}

Really, you could rename MetaLexer -> Lexer and Lexer -> LexerItems, and then you'd really match the iterators in the standard lib.

If your question is really how do I keep references to the data read from stdin, that's a different question, and very far from your original statement.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • So I'd essentially be using a factory to make the lifetimes work out properly. I like it. Do you know of a reference that talks about "how do I keep references to the data read from stdin"? I haven't noticed anything similar that was not referring to an obsolete version, though possibly I haven't applied `.clone()` creatively enough. – Matt Bryant Dec 30 '14 at 01:37
  • @MattBryant I don't have any straight up references, but I've a few ideas I could put on an answer to a new question. And then you'd get other people's ideas too. – Shepmaster Dec 30 '14 at 02:10