0

Background

To give a little bit of context: I have to mention that I am fairly new to Rust, so some of my questions might be really basic/silly - apologies for that! I am trying to implement a thin wrapper around rusqlite, which should allow to cache inserts to the database into a HashMap and then execute the actual inserts in a batched manner to improve performance of the inserts. The concept is taken from this post, in particular the batched version of the code.

To achieve this, I have thought of the following construct:

  1. A struct DataBase, which holds some basic attributes of the database and a HashMap cache which contains the information of the cached tables
  2. A struct CachedTable holding a vector with the field names and a vector of vectors holding the records to insert.
  3. An enum SQLDataType enumerating a "Text" and "Integer" variant (for now)

The code looks like this:

pub enum SQLDataType {
    Text(String),
    Integer(isize),
}

#[derive(Debug)]
struct CachedTable {
    data: RefCell<Vec<Vec<SQLDataType>>>,
    fields: Vec<String>,
}

#[derive(Debug)]
pub struct DataBase {
    name: String,
    conn: Connection,
    cache: HashMap<String, CachedTable>,
    batch_size: usize,
}

Then I have a function commit_writes which does the follwoing:

  1. Creates a vector of the tables in the cache
  2. Loops through the tables vector and creates a prepared INSERT INTO statement based on the field names and the no of records (in the batched approach one needs to concatenate the list of value placeholders as many times as there are records to process within the VALUES() part of the statement.)
  3. Creates the params vector
  4. Call the statement.execute with the prepared params vector

Issue

I have tried a number of versions and got error messages about missing ToSql traits, livetime and borrowing errors, etc. After a bit of searching I found this stackoverflow question, but the code does not compile anymore and even if it would, it is handling a single data type (String) in the params vector. What I would like to achieve is to pass a vector of mixed data types (Strings and Integers for now) to the statement.execute function. My current version of commit_writes is as follows - the full code can be found here:

impl DataBase {
    pub fn new(db_name: &str) -> Self {
        //... snip ...//
    }

    fn add_to_cache(&mut self, table_name: &str, record: Vec<SQLDataType>) {
        //... snip ...//
    }

    pub fn commit_writes(&mut self) {
        // collect all keys to then iterate over the cache
        // collecting all keys avoids the "move issue" of iterators
        // over a mutable reference to the 'cache' HashMap
        let mut tables: Vec<String> = Vec::new();
        for key in self.cache.keys() {
            tables.push(key.to_owned());
        }
        // process all cached tables and write to the DB
        for table in &tables {
            // only process cached tables that do contain data
            let no_of_records = self.cache[table].data.borrow().len();
            if no_of_records > 0 {
                // create the field list
                let field_list = self.cache[table].fields.join(", ");
                // get the number of elements and create the params part of the SQL
                let no_elems = self.cache[table].fields.len();
                let params_string = vec!["?"; no_elems].join(", ").repeat(no_of_records);
                // create the SQL statement and prepare it
                let sql_ins = format!(
                    "INSERT INTO {} ({}) VALUES ({})",
                    table, field_list, params_string
                );
                let stmt = self.conn.prepare_cached(sql_ins.as_str()).unwrap();

                // create the param values vector
                let mut param_values: Vec<_> = Vec::new();
                let mut int_value: isize = 0;
                let mut string_value: String = "default".to_string();
                for record in self.cache[table].data.borrow().iter() {
                    for item_value in record.iter() {
                        match item_value {
                            SQLDataType::Integer(v) => {
                                int_value = *v;
                                param_values.push(int_value as &dyn ToSql);
                            }
                            SQLDataType::Text(v) => {
                                string_value = *v;
                                param_values.push(string_value as &dyn ToSql);
                            }
                        }
                    }
                }

                // fianlly executed the batch of inserts
                stmt.execute(&*param_values).unwrap();

                // now clear the cached table's data
                self.cache[table].data.borrow_mut().clear();
            }
        }
    }
}

and this is the output of cargo check:

    Checking utilrs v0.1.0 (C:\LocalData\Rust\utilrs)
error[E0605]: non-primitive cast: `isize` as `&dyn ToSql`
   --> src\persistence.rs:104:51
    |
104 | ...                   param_values.push(int_value as &dyn ToSql);
    |                                         ^^^^^^^^^^^^^^^^^^^^^^^ invalid cast
    |
help: consider borrowing the value
    |
104 |                                 param_values.push(&int_value as &dyn ToSql);
    |                                                   +

error[E0605]: non-primitive cast: `String` as `&dyn ToSql`
   --> src\persistence.rs:108:51
    |
108 | ...                   param_values.push(string_value as &dyn ToSql);
    |                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^ invalid cast
    |
help: consider borrowing the value
    |
108 |                                 param_values.push(&string_value as &dyn ToSql);
    |                                                   +

For more information about this error, try `rustc --explain E0605`.
error: could not compile `utilrs` due to 2 previous errors

But then if I add the borrow suggested by the compiler, this leads to the following errors related to the borrow checker:

error[E0506]: cannot assign to `int_value` because it is borrowed
   --> src\persistence.rs:103:33
    |
103 | ...                   int_value = *v;
    |                       ^^^^^^^^^^^^^^ assignment to borrowed `int_value` occurs here
104 | ...                   param_values.push(&int_value as &dyn ToSql);
    |                       -------------------------------------------
    |                       |                 |
    |                       |                 borrow of `int_value` occurs here
    |                       borrow later used here

error[E0506]: cannot assign to `string_value` because it is borrowed
   --> src\persistence.rs:107:33
    |
107 | ...                   string_value = *v;
    |                       ^^^^^^^^^^^^ assignment to borrowed `string_value` occurs here
108 | ...                   param_values.push(&string_value as &dyn ToSql);
    |                       ----------------------------------------------
    |                       |                 |
    |                       |                 borrow of `string_value` occurs here
    |                       borrow later used here

error[E0507]: cannot move out of `*v` which is behind a shared reference
   --> src\persistence.rs:107:48
    |
107 | ...                   string_value = *v;
    |                                      ^^ move occurs because `*v` has type `String`, which does not impent the `Copy` trait

error[E0596]: cannot borrow `stmt` as mutable, as it is not declared as mutable
   --> src\persistence.rs:115:17
    |
93  |                 let stmt = self.conn.prepare_cached(sql_ins.as_str()).unwrap();
    |                     ---- help: consider changing this to be mutable: `mut stmt`
...
115 |                 stmt.execute(&*param_values).unwrap();
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable

Some errors have detailed explanations: E0506, E0507, E0596.
For more information about an error, try `rustc --explain E0506`.

So I am going around in circles and would really appreciate any help with this!

Dan0175
  • 97
  • 1
  • 5
  • Can you post a minimal reproducible example? https://stackoverflow.com/help/minimal-reproducible-example (The solution might involve `Box`.) – Dogbert Jul 17 '22 at 14:27
  • Hi Dogbert - I have created a complete version of the code in a Rust playground. Hope that helps? https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=2ae9c51f8c68913533a312c9eb538be7 – Dan0175 Jul 17 '22 at 18:52
  • Made a lot of fixes including removing seemingly unnecessary RefCell use. Does this work for you https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a892ccd4c4708b045f9704d71ad8bbd2? – Dogbert Jul 18 '22 at 02:14
  • @Dogbert - thanks a million, your code works like a charm! Based on a comparison of my code vs. your code, I think I overdid with refernces / dereferencing, aliasing and the use of RefCell. Key takeaways from what I understood is the use of `self.cache.get_mut(table).unwrap().data.clear();` to avoid the use of the RefCell, leaving away the unnecessary `borrow()` and `borrow_mut()` and finally the use of `let mut param_values: Vec> = Vec::new();` combined with `param_values.push(Box::new(v))` and `stmt.execute(rusqlite::params_from_iter(param_values)).unwrap();` Thanks again! – Dan0175 Jul 18 '22 at 20:42

1 Answers1

1

I would suggest storing rusqlite::types::Value values directly in your struct instead of Box<dyn ToSql> like my solution in a comment above, which also works. Here's the new one:

                let mut param_values: Vec<rusqlite::types::Value> = Vec::new();
                for record in self.cache[table].data.iter() {
                    for item_value in record.iter() {
                        match item_value {
                            SQLDataType::Integer(v) => {
                                param_values.push((*v).into());
                            }
                            SQLDataType::Text(v) => {
                                param_values.push(v.clone().into());
                            }
                        }
                    }
                }

Other than that, I went ahead and removed unnecessary RefCell uses and cleaned up some small things. The final code:

use rusqlite::Connection;
use std::collections::HashMap;

// See also https://stackoverflow.com/questions/40559931/vector-store-mixed-types-of-data-in-rust

#[derive(Debug)]
pub enum SQLDataType {
    Text(String),
    Integer(isize),
}

#[derive(Debug)]
struct CachedTable {
    data: Vec<Vec<SQLDataType>>,
    fields: Vec<String>,
}

#[derive(Debug)]
pub struct DataBase {
    name: String,
    conn: Connection,
    cache: HashMap<String, CachedTable>,
    batch_size: usize,
}

impl DataBase {
    pub fn new(db_name: &str) -> Self {
        let db_conn = Connection::open(db_name).unwrap();
        let mut db = DataBase {
            name: db_name.to_owned(),
            conn: db_conn,
            cache: HashMap::new(),
            batch_size: 50,
        };

        db.cache.insert(
            String::from("User"),
            CachedTable {
                data: Vec::new(),
                fields: vec![
                    String::from("Name"),
                    String::from("Age"),
                    String::from("Gender"),
                ],
            },
        );
        db
    }

    pub fn add_to_cache(&mut self, table_name: &str, record: Vec<SQLDataType>) {
        if let Some(chached_table) = self.cache.get_mut(table_name) {
            chached_table.data.push(record);
        }
    }

    pub fn commit_writes(&mut self) {
        // collect all keys to then iterate over the cache
        // collecting all keys avoids the "move issue" of iterators
        // over a mutable reference to the 'cache' HashMap
        let mut tables: Vec<String> = Vec::new();
        for key in self.cache.keys() {
            tables.push(key.to_owned());
        }
        // process all cached tables and write to the DB
        for table in &tables {
            // only process cached tables that do contain data
            let no_of_records = self.cache[table].data.len();
            if no_of_records > 0 {
                // create the field list
                let field_list = self.cache[table].fields.join(", ");
                // get the number of elements and create the params part of the SQL
                let no_elems = self.cache[table].fields.len();
                let params_string = vec!["?"; no_elems].join(", ").repeat(no_of_records);
                // create the SQL statement and prepare it
                let sql_ins = format!(
                    "INSERT INTO {} ({}) VALUES ({})",
                    table, field_list, params_string
                );
                let mut stmt = self.conn.prepare_cached(sql_ins.as_str()).unwrap();

                // create the param values vector
                let mut param_values: Vec<rusqlite::types::Value> = Vec::new();
                for record in self.cache[table].data.iter() {
                    for item_value in record.iter() {
                        match item_value {
                            SQLDataType::Integer(v) => {
                                param_values.push((*v).into());
                            }
                            SQLDataType::Text(v) => {
                                param_values.push(v.clone().into());
                            }
                        }
                    }
                }

                // fianlly executed the batch of inserts
                stmt.execute(rusqlite::params_from_iter(param_values))
                    .unwrap();

                // now clear the cached table's data
                self.cache.get_mut(table).unwrap().data.clear();
            }
        }
    }
}

fn main() {
    let mut db = DataBase::new("test.db");
    let record: Vec<SQLDataType> = vec![
        SQLDataType::Text("John Doe".to_string()),
        SQLDataType::Integer(35),
        SQLDataType::Text("male".to_string()),
    ];
    db.add_to_cache("User", record);
    db.commit_writes();
}
Dogbert
  • 212,659
  • 41
  • 396
  • 397