I am trying to store into a HashMap
the result of a parsing operation on a text file (parsed with nom). The result is comprised of a Vec
buffer and some slices over that buffer. The goal is to store those together in a tuple or struct as a value in the hash map (with String
key). But I can't work around the lifetime issues.
Context
The parsing itself takes an &[u8]
and returns some data structure containing slices over that same input, e.g.:
struct Cmd<'a> {
pub name: &'a str
}
fn parse<'a>(input: &'a [u8]) -> Vec<Cmd<'a>> {
[...]
}
Now, because the parsing operates on slices without storage, I need to first store the input text in a Vec
so that the output slices remain valid, so something like:
struct Entry<'a> {
pub input_data: Vec<u8>,
pub parsed_result: Vec<Cmd<'a>>
}
Then I would ideally store this Entry
into a HashMap
. This is were troubles arise. I tried two different approaches:
Attempt A: Store then parse
Create the HashMap
entry first with the input, parse referencing the HashMap
entry directly, and then update it.
pub fn store_and_parse(filename: &str, map: &mut HashMap<String, Entry>) {
let buffer: Vec<u8> = load_from_file(filename);
let mut entry = Entry{ input_data: buffer, parsed_result: vec![] };
let cmds = parse(&entry.input_data[..]);
entry.parsed_result = cmds;
map.insert(filename.to_string(), entry);
}
This doesn't work because the borrow checker complains that &entry.input_data[..]
borrows with the same lifetime as entry
, and therefore cannot be moved into map
as there's an active borrow.
error[E0597]: `entry.input_data` does not live long enough
--> src\main.rs:26:23
|
23 | pub fn store_and_parse(filename: &str, map: &mut HashMap<String, Entry>) {
| --- has type `&mut std::collections::HashMap<std::string::String, Entry<'1>>`
...
26 | let cmds = parse(&entry.input_data[..]);
| ^^^^^^^^^^^^^^^^ borrowed value does not live long enough
27 | entry.parsed_result = cmds;
28 | map.insert(filename.to_string(), entry);
| --------------------------------------- argument requires that `entry.input_data` is borrowed for `'1`
29 | }
| - `entry.input_data` dropped here while still borrowed
error[E0505]: cannot move out of `entry` because it is borrowed
--> src\main.rs:28:38
|
26 | let cmds = parse(&entry.input_data[..]);
| ---------------- borrow of `entry.input_data` occurs here
27 | entry.parsed_result = cmds;
28 | map.insert(filename.to_string(), entry);
| ------ ^^^^^ move out of `entry` occurs here
| |
| borrow later used by call
Attempt B: Parse then store
Parse first, then try to store both the Vec
buffer and the data slices into it all together into the HashMap
.
pub fn parse_and_store(filename: &str, map: &mut HashMap<String, Entry>) {
let buffer: Vec<u8> = load_from_file(filename);
let cmds = parse(&buffer[..]);
let entry = Entry{ input_data: buffer, parsed_result: cmds };
map.insert(filename.to_string(), entry);
}
This doesn't work because the borrow checker complains that cmds
has same lifetime as &buffer[..]
but buffer
will be dropped by the end of the function. It ignores the fact that cmds
and buffer
have the same lifetime, and are both (I wish) moved into entry
, which is itself moved into map
, so there should be no lifetime issue here.
error[E0597]: `buffer` does not live long enough
--> src\main.rs:33:21
|
31 | pub fn parse_and_store(filename: &str, map: &mut HashMap<String, Entry>) {
| --- has type `&mut std::collections::HashMap<std::string::String, Entry<'1>>`
32 | let buffer: Vec<u8> = load_from_file(filename);
33 | let cmds = parse(&buffer[..]);
| ^^^^^^ borrowed value does not live long enough
34 | let entry = Entry{ input_data: buffer, parsed_result: cmds };
35 | map.insert(filename.to_string(), entry);
| --------------------------------------- argument requires that `buffer` is borrowed for `'1`
36 | }
| - `buffer` dropped here while still borrowed
error[E0505]: cannot move out of `buffer` because it is borrowed
--> src\main.rs:34:34
|
31 | pub fn parse_and_store(filename: &str, map: &mut HashMap<String, Entry>) {
| --- has type `&mut std::collections::HashMap<std::string::String, Entry<'1>>`
32 | let buffer: Vec<u8> = load_from_file(filename);
33 | let cmds = parse(&buffer[..]);
| ------ borrow of `buffer` occurs here
34 | let entry = Entry{ input_data: buffer, parsed_result: cmds };
| ^^^^^^ move out of `buffer` occurs here
35 | map.insert(filename.to_string(), entry);
| --------------------------------------- argument requires that `buffer` is borrowed for `'1`
Minimal (non-)working example
use std::collections::HashMap;
#[derive(Debug, PartialEq)]
struct Cmd<'a> {
name: &'a str
}
fn parse<'a>(input: &'a [u8]) -> Vec<Cmd<'a>> {
Vec::new()
}
fn load_from_file(filename: &str) -> Vec<u8> {
Vec::new()
}
#[derive(Debug, PartialEq)]
struct Entry<'a> {
pub input_data: Vec<u8>,
pub parsed_result: Vec<Cmd<'a>>
}
// pub fn store_and_parse(filename: &str, map: &mut HashMap<String, Entry>) {
// let buffer: Vec<u8> = load_from_file(filename);
// let mut entry = Entry{ input_data: buffer, parsed_result: vec![] };
// let cmds = parse(&entry.input_data[..]);
// entry.parsed_result = cmds;
// map.insert(filename.to_string(), entry);
// }
pub fn parse_and_store(filename: &str, map: &mut HashMap<String, Entry>) {
let buffer: Vec<u8> = load_from_file(filename);
let cmds = parse(&buffer[..]);
let entry = Entry{ input_data: buffer, parsed_result: cmds };
map.insert(filename.to_string(), entry);
}
fn main() {
println!("Hello, world!");
}
Edit: Attempt with 2 maps
As Kevin pointed, and this is what threw me off the first time (above attempts), the borrow checker doesn't understand that moving a Vec
doesn't invalidate the slices because the heap buffer of the Vec
is not touched. Fair enough.
Side note: I am ignoring the parts of Kevin's answer related to using indexes (the Rust documentation explicitly states slices are a better replacement for indices, so I feel this is working against the language) and the use of external crates (which also are explicitly working against the language). I am trying to learn and understand how to do this "the Rust way", not at all costs.
So my immediate reaction to that was to change the data structure: first insert the storage Vec
into a first HashMap
, and once it's there call the parse()
function to create the slices directly pointing into the HashMap
value. Then store those into a second HashMap
, which would naturally dissociate the two. However that also doesn't work as soon as I put all of that in a loop, which is the broader goal of this code:
fn two_maps<'a>(
filename: &str,
input_map: &'a mut HashMap<String, Vec<u8>>,
cmds_map: &mut HashMap<String, Vec<Cmd<'a>>>,
queue: &mut Vec<String>) {
{
let buffer: Vec<u8> = load_from_file(filename);
input_map.insert(filename.to_string(), buffer);
}
{
let buffer = input_map.get(filename).unwrap();
let cmds = parse(&buffer[..]);
for cmd in &cmds {
// [...] Find further dependencies to load and parse
queue.push("...".to_string());
}
cmds_map.insert(filename.to_string(), cmds);
}
}
fn main() {
let mut input_map = HashMap::new();
let mut cmds_map = HashMap::new();
let mut queue = Vec::new();
queue.push("file1.txt".to_string());
while let Some(path) = queue.pop() {
println!("Loading file: {}", path);
two_maps(&path[..], &mut input_map, &mut cmds_map, &mut queue);
}
}
The problem here is that once the input buffer is in the first map input_map
, referencing it binds the lifetime of each new parsed result to the entry of that HashMap
, and therefore the &'a mut
reference (the 'a
lifetime added). Without this, the compiler complains that data flows from input_map
into cmds_map
with unrelated lifetimes, which is fair enough. But with this, the &'a mut
reference to input_map
becomes locked on the first loop iteration and never released, and the borrow checker chokes on the second iteration, quite rightfully so.
So I am stuck again. Is what I am trying to do completely unreasonable and impossible in Rust? How can I approach the problem (algorithms, data structures) to make things work lifetime-wise? I really don't see what's the "Rust way" here to store a collection of buffers and slices over those buffers. Is the only solution (that I want to avoid) to first load all files, and then parse them? This is very impractical in my case because most files contain references to other files, and I want to load the minimum chain of dependencies (likely < 10 files), not the entire collection (which is something like 3000+ files), and I can only access dependencies by parsing each file.
It seems the core of the issue is that storing the input buffers into any kind of data structure requires a mutable reference to said data structure for the duration of the insert operation, which is incompatible with having long-lived immutable references to each single buffer (for the slices) because those references need to have the same lifetime as per the HashMap
definition. Is there any other data structure (maybe immutable ones) that lifts this? Or am I completely on the wrong track?