Questions tagged [rust-polars]
271 questions
0
votes
1 answer
How to aggregate over all rows in a Polars dataframe?
In a Polars dataframe, I know that I can aggregate over a group of rows that have the same value in a column using for example .groupby("first_name").agg([...]).
How can I aggregate over all rows in a dataframe?
For example, I'd like to get the mean…

bwooster
- 47
- 7
0
votes
0 answers
How do I dynamically compose a Polars expression from user-provided input and run it in Rust?
I'm writing a CLI utility in Rust that allows users to do data-wrangling from the command-line.
I managed to leverage Polar's join commands and parse the columns to join on using:
let selcols1: Vec<_> =…

jnatividad
- 151
- 5
0
votes
0 answers
Fastest way to remove duplicates in Rust Polars?
I have a dataframe which occasionally has one or more observations for the same entity in a group. The groups are sorted by a rank and I only need the first entry/highest rank per entity per group. I have been removing them by concatenating the…

Jage
- 453
- 2
- 9
0
votes
0 answers
How to create column of sequences from 1 to group_size in a Rust Polars groupby?
I am currently doing a groupby and ranking values in Polars:
let df = df.clone().lazy().select([
all(),
col("value").rank(rank_opts).over(["groupby_id"]).alias("rank")])
.collect().unwrap();
But I am finding it to be pretty slow. I am…

Jage
- 453
- 2
- 9
0
votes
0 answers
how to convert in rust arrow Vec to polars DataFrame(df)
I got my file be a varible data_batch that is Vec now I want to convert to to polars DataFrame(df). I have no idea how to do this. There are no examples of this ...
I expect a way to convert to polars dataframe to be possible as I have properly…

andy8203
- 51
- 2
0
votes
0 answers
polars how to I read json lines from s3
I am at a loss as to which polars interface I would pass s3_bytes from json lines file to get a dataframe
impl S3_Operation {
pub async fn new(file: File) -> Self {
let config = aws_config::load_from_env().await;
let client =…

andy8203
- 51
- 2
0
votes
1 answer
How do I stack a wide Polars DataFrame in Rust into a narrow DataFrame?
In R, I am stacking a data.frame like so: stack(preds[1:(ncol(preds)-4)])
This selects 1,000 columns, and stacks them all into a single column, while creating a second column which is a string, the name of the column that row originally came from.
I…

Jage
- 453
- 2
- 9
0
votes
0 answers
How to apply a function to elements of a Polars Series in Rust?
I am trying to convert some code using an ndarray to a polars DataFrame.
I have a function that samples from multiple distributions. I create an ndarray with number_of_distributions x number_of_samples shape. Then I iterate over each column, and for…

Jage
- 453
- 2
- 9
0
votes
1 answer
Rust transforming a 2D-array to 1D array
Does someone knows why
x = df.select(["A"]).unwrap().to_ndarray::().unwrap()
is considered as a 2d array while I want it to be 1d array? Is there a function to reshape it to 1d array? Here the shape of y is (100, 1).
The type of x is a…

Arli94
- 680
- 2
- 8
- 19
0
votes
1 answer
Converting a Utf8 Series into a Series of List via a custom function in Rust polars
I have a Utf8 column in my DataFrame, and from that I want to create a column of List.
In particular for each row I am taking the text of a HTML document and using soup to parse out all the paragraphs of class
, and store the collection of…

BrettW
- 37
- 6
0
votes
0 answers
Diagonally Concatenate Polars DataFrame but lift Null of nested structs rather than nulling out child fields
I have a heterogeneous container that has a timestamp, and one of an inner type that I am serializing in to json in batches and loading into dataframes.
#[derive(serde::Serialize)]
pub struct Container {
pub timestamp: Option,
pub…

Dennis Collective
- 193
- 1
- 1
- 8
0
votes
1 answer
Writing long rows using polars DataFrame throws runtime error
I have the following async block of code that runs as part of a larger program, and it runs successfully when the dataframe has a row with length 10, or 30, but when i put it to a larger number like 300, it tries to write the dataframe as parquet…

ripbozo
- 45
- 4
0
votes
1 answer
How to create polars DataFrame with Vec> as a Series
I desire a DataFrame like so:
timestamp | bids | asks | ticker
-------------------------------------------------------------------
1598215600 | [[10, 20], [15, 30]] | [[20, 10], [25, 20]] | "AAPL"
1598222400 | …

ripbozo
- 45
- 4
0
votes
1 answer
Read CSV file into Polars dataframe with Rust
I would like to read the CSV file into a Polars dataframe.
Copying the code from the official documentation fails to run with cargo.
use polars::prelude::*;
use std::fs::File;
fn example() -> Result {
let file =…

David
- 3
- 4
0
votes
0 answers
apply function to column out-of-memory in Python Polars
I have a large GIS dataset (167x25e6) that was generated from GeoJSON, via .csv to now parquet. This is my first time that I really have to deal with out-of-memory dataframes and I am still trying to find out if Polars is the right option for my…

Lionel Peer
- 11
- 2