How do you efficiently take a parquet file in Rust and iterate over it as a list of structs?
E.g.
struct Reading {
datetime: chrono::DateTime<chrono::Utc>,
value: f64,
}
let filename = "readings.parquet";
let readings: Vec<Reading> = ???;
The only thing I've been able to think to do is to use parquet::file::reader::{FileReader, SerializedFileReader};
, but this is extremely slow (1M rows / s)—slower even than converting the parquet to a CSV in python and then reading the csv into rust.
Current attempt:
use parquet::file::reader::{FileReader, SerializedFileReader};
use parquet::record::RowAccessor;
use std::fs::File;
use std::path::Path;
struct Reading {
datetime: String,
value: f64,
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let time = std::time::Instant::now();
let filename = "readings.parquet";
let file = File::open(&Path::new(filename))?;
let reader = SerializedFileReader::new(file)?;
let mut iter = reader.get_row_iter(None)?;
let mut readings: Vec<Reading> = Vec::new();
while let Some(record) = iter.next() {
let date: String = record.get_string(0)?.to_string();
let time: String = record.get_string(1)?.to_string();
let datetime = format!("{} {}", date, time);
let last: f64 = record.get_double(2)?;
let reading = Reading { datetime, value: last };
readings.push(reading);
}
println!("time: {:?}", time.elapsed());
println!("readings: {:?}", readings.len());
Ok(())
}