Hi I am trying to read from postgres into a polars frame in a generic way.
I have read a post here Rust: Read dataframe in polars from mysql
about reading from mysql and want to change this so I don't need to handle the columns for each new query.
I was planning on storing all the values in a large vector ordered by columns first. Storing each type in an enum.
I have the following code.
let rows = client.query(&select_sql, &[]).await?;
#[derive(Clone, Debug)]
enum GenericValue {
String(String),
Bool(bool),
DateTime(DateTime<Utc>),
Int32(i32),
UInt(u32),
}
let number_of_rows = rows.len();
let number_of_columns = rows.first().expect("expected one row").columns().len();
println!("Number of columns: {:?}", number_of_columns);
println!("Number of rows: {:?}", number_of_rows);
let mut data: Vec<Option<GenericValue>> = vec![None; number_of_columns * number_of_rows];
for (row_index, row) in rows.iter().enumerate() {
for (col_index, column) in row.columns().iter().enumerate() {
let colType: String = column.type_().to_string();
let index = col_index * number_of_columns + row_index;
if colType == "int4" {
data[index] = Some(GenericValue::Int32(row.get(col_index)));
}
else if colType == "text" {
data[index] = Some(GenericValue::String(row.get(col_index)));
}
else if colType == "varchar" {
data[index] = Some(GenericValue::String(row.get(col_index)));
}
else if colType == "bool" {
data[index] = Some(GenericValue::Bool(row.get(col_index)));
}
else if colType == "timestamptz" {
data[index] = Some(GenericValue::DateTime(row.get(col_index)));
}
else {
panic!("{}", &colType);
}
}
for c in 0 .. number_of_columns {
let start_index = c * number_of_columns;
let slice = data[start_index .. start_index + number_of_rows];
let first: Option<GenericValue> = data[start_index];
let chunked_data: Utf8Chunked = slice.iter().map(|v| v.unwrap()).collect();
println!("{:?}", chunked_data);
// let ca_country: Utf8Chunked = values.iter().map(|v| &*v.country).collect();
}
The problem is I need to implement the trait PolarsAsRef<str>
forGenericValue
which I am not sure how to do and how would this chunked data handle NULLS ?
How can I create the correct chunked data for each series here ?
Note this is to move away from connectorx as it still seems pinned to a very old polars and I don't get much performance benefits from it anyway as I use partitioned postgres tables which seem to break connectorx's parallelism.
Thanks