2

I am trying to write a function in rust that I can call from python that accepts a list of dictionaries (think pandas dataframe-like data) and access those key, values from rust. How can I accomplish this? I am using pyo3. Do I need to define a struct that matches the key, value pairs of the python dict input?

As a sample function, I am trying to pass in a list of dictionaries and sum the values corresponding to the key key into a total. Each dictionary in my python list of dicts has the key key which corresponds to an int.

use pyo3::prelude::*;
use pyo3::wrap_pyfunction;
use pyo3::types::PyDict;

#[pyfunction]
fn sum_list_dicts(a: Vec<PyDict>, key: String) -> PyResult<i32> {
    let mut tot = 0_i32;

    for d in a.iter() {
        tot += d[key];
    }
    Ok(tot)
}

#[pymodule]
fn rustpy(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(sum_list_dicts, m)?)?;

    Ok(())
}
cpage
  • 119
  • 6
  • 27

1 Answers1

3

So this really depends on what you actually want to do. If you don't want to mess with the actual Py items, you can simply do this:

#[pyfunction]
fn sum_list_dicts(a: Vec<HashMap<String, i32>>, key: String) -> PyResult<i32> {
    let mut tot = 0_i32;

    for d in a.iter() {
        tot += d[&key];
    }
    Ok(tot)
}

If you want to work with PyList, PyDict etc, this works too:

#[pyfunction]
fn sum_list_dicts(a: &PyList, key: String) -> PyResult<i32> {
   let mut tot = 0_i32;

   for d in a.iter() {
       tot += d.downcast::<PyDict>()?.get_item(&key).unwrap().downcast::<PyInt>()?.extract::<i32>()?;
   }
   Ok(tot)
}

With either, you can then simply invoke it from the Python side:

a = [{"ham":0, "eggs":0}, {"eggs": 1}, {"eggs": 3, "spam":2}]
b = sum_list_dicts(a, "eggs")
print(b)
>>> 4
Kvothe
  • 386
  • 2
  • 8
  • Thanks for the answer. Just curious, does PyList/PyDict give us more performance boost because it looks way to verbose as compared to using Vec/HashMap – xxx222 Mar 04 '22 at 10:25
  • Creating Rust structures from these effectively means copying everything, which is probably slower. It depends on your use case, and you should benchmark this. – Kvothe Mar 05 '22 at 11:10