1

I have an app which deals with lexemes that are composed into larger sentences. I am cloning these lexemes all over the place, but they are not thin, and many lexemes are repeated over and over in different sentences.

My plan is to write a "lexicon" that will store the lexemes and can be queried for them with their constituent parts. If they don't exist, it will create them, and return references to them. This way I can build my sentences with references to the lexemes rather than with lexemes.

For simplicity, let's say that lexemes are composed of 2 Strings. The code that follows is illustrative of what I want, but the test fails since I can only borrow the lexicon as mutable once. So my question is, what would be the correct strategy here?

#![feature(hash_set_entry)]

use std::collections::HashSet;

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
struct Lexeme {
    name: String,
    text: String,
}

struct Lexicon {
    lexemes: HashSet<Lexeme>,
}

impl Lexicon {
    pub fn get(&mut self, name: String, text: String) -> &Lexeme {
        let lexeme = Lexeme { name, text };
        self.lexemes.get_or_insert(lexeme)
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_1() {
        let mut lexicon = Lexicon {
            lexemes: HashSet::new(),
        };
        let lex1 = lexicon.get("name1".to_string(), "text1".to_string());
        let _ = lexicon.get("name2".to_string(), "text2".to_string());
        let lex3 = lexicon.get("name1".to_string(), "text1".to_string());
        assert_eq!(lex1.name, lex3.name);
    }
}

Compile errors:

error[E0499]: cannot borrow `lexicon` as mutable more than once at a time
  --> src/lib.rs:32:17
   |
31 |         let lex1 = lexicon.get("name1".to_string(), "text1".to_string());
   |                    ------- first mutable borrow occurs here
32 |         let _ = lexicon.get("name2".to_string(), "text2".to_string());
   |                 ^^^^^^^ second mutable borrow occurs here
33 |         let lex3 = lexicon.get("name1".to_string(), "text1".to_string());
34 |         assert_eq!(lex1.name, lex3.name);
   |         --------------------------------- first borrow later used here

error[E0499]: cannot borrow `lexicon` as mutable more than once at a time
  --> src/lib.rs:33:20
   |
31 |         let lex1 = lexicon.get("name1".to_string(), "text1".to_string());
   |                    ------- first mutable borrow occurs here
32 |         let _ = lexicon.get("name2".to_string(), "text2".to_string());
33 |         let lex3 = lexicon.get("name1".to_string(), "text1".to_string());
   |                    ^^^^^^^ second mutable borrow occurs here
34 |         assert_eq!(lex1.name, lex3.name);
   |         --------------------------------- first borrow later used here

(Playground)

Edited to add a link to what I'd consider as a correct answer (as pointed out in the comments by @Shepmaster):

https://stackoverflow.com/a/40187454/683546

  • 1
    Can you better explain what the problem here is? Are you asking about borrowing or boxing? – tadman Mar 11 '20 at 17:37
  • @tadman Well, the question is about any strategy that would help attain the stated goal, I don't know if boxing would help... – Enrique Pérez Arnaud Mar 11 '20 at 17:51
  • It looks like your question might be answered by the answers of [Using a HashSet to canonicalize objects in Rust](https://stackoverflow.com/q/40186370/155423); [How can I better store a string to avoid many clones?](https://stackoverflow.com/q/42097611/155423). If not, please **[edit]** your question to explain the differences. Otherwise, we can mark this question as already answered. – Shepmaster Mar 11 '20 at 18:36
  • The term you are looking for is [*string interning*](https://en.wikipedia.org/wiki/String_interning). – Shepmaster Mar 11 '20 at 18:37
  • 1
    @Shepmaster Yes, it appears you've nailed the issue. Thanks! – Enrique Pérez Arnaud Mar 11 '20 at 19:09

0 Answers0