Questions tagged [rust-polars]

271 questions
0
votes
1 answer

How to aggregate over all rows in a Polars dataframe?

In a Polars dataframe, I know that I can aggregate over a group of rows that have the same value in a column using for example .groupby("first_name").agg([...]). How can I aggregate over all rows in a dataframe? For example, I'd like to get the mean…
bwooster
  • 47
  • 7
0
votes
0 answers

How do I dynamically compose a Polars expression from user-provided input and run it in Rust?

I'm writing a CLI utility in Rust that allows users to do data-wrangling from the command-line. I managed to leverage Polar's join commands and parse the columns to join on using: let selcols1: Vec<_> =…
jnatividad
  • 151
  • 5
0
votes
0 answers

Fastest way to remove duplicates in Rust Polars?

I have a dataframe which occasionally has one or more observations for the same entity in a group. The groups are sorted by a rank and I only need the first entry/highest rank per entity per group. I have been removing them by concatenating the…
Jage
  • 453
  • 2
  • 9
0
votes
0 answers

How to create column of sequences from 1 to group_size in a Rust Polars groupby?

I am currently doing a groupby and ranking values in Polars: let df = df.clone().lazy().select([ all(), col("value").rank(rank_opts).over(["groupby_id"]).alias("rank")]) .collect().unwrap(); But I am finding it to be pretty slow. I am…
Jage
  • 453
  • 2
  • 9
0
votes
0 answers

how to convert in rust arrow Vec to polars DataFrame(df)

I got my file be a varible data_batch that is Vec now I want to convert to to polars DataFrame(df). I have no idea how to do this. There are no examples of this ... I expect a way to convert to polars dataframe to be possible as I have properly…
andy8203
  • 51
  • 2
0
votes
0 answers

polars how to I read json lines from s3

I am at a loss as to which polars interface I would pass s3_bytes from json lines file to get a dataframe impl S3_Operation { pub async fn new(file: File) -> Self { let config = aws_config::load_from_env().await; let client =…
andy8203
  • 51
  • 2
0
votes
1 answer

How do I stack a wide Polars DataFrame in Rust into a narrow DataFrame?

In R, I am stacking a data.frame like so: stack(preds[1:(ncol(preds)-4)]) This selects 1,000 columns, and stacks them all into a single column, while creating a second column which is a string, the name of the column that row originally came from. I…
Jage
  • 453
  • 2
  • 9
0
votes
0 answers

How to apply a function to elements of a Polars Series in Rust?

I am trying to convert some code using an ndarray to a polars DataFrame. I have a function that samples from multiple distributions. I create an ndarray with number_of_distributions x number_of_samples shape. Then I iterate over each column, and for…
Jage
  • 453
  • 2
  • 9
0
votes
1 answer

Rust transforming a 2D-array to 1D array

Does someone knows why x = df.select(["A"]).unwrap().to_ndarray::().unwrap() is considered as a 2d array while I want it to be 1d array? Is there a function to reshape it to 1d array? Here the shape of y is (100, 1). The type of x is a…
Arli94
  • 680
  • 2
  • 8
  • 19
0
votes
1 answer

Converting a Utf8 Series into a Series of List via a custom function in Rust polars

I have a Utf8 column in my DataFrame, and from that I want to create a column of List. In particular for each row I am taking the text of a HTML document and using soup to parse out all the paragraphs of class

, and store the collection of…

BrettW
  • 37
  • 6
0
votes
0 answers

Diagonally Concatenate Polars DataFrame but lift Null of nested structs rather than nulling out child fields

I have a heterogeneous container that has a timestamp, and one of an inner type that I am serializing in to json in batches and loading into dataframes. #[derive(serde::Serialize)] pub struct Container { pub timestamp: Option, pub…
Dennis Collective
  • 193
  • 1
  • 1
  • 8
0
votes
1 answer

Writing long rows using polars DataFrame throws runtime error

I have the following async block of code that runs as part of a larger program, and it runs successfully when the dataframe has a row with length 10, or 30, but when i put it to a larger number like 300, it tries to write the dataframe as parquet…
ripbozo
  • 45
  • 4
0
votes
1 answer

How to create polars DataFrame with Vec> as a Series

I desire a DataFrame like so: timestamp | bids | asks | ticker ------------------------------------------------------------------- 1598215600 | [[10, 20], [15, 30]] | [[20, 10], [25, 20]] | "AAPL" 1598222400 | …
ripbozo
  • 45
  • 4
0
votes
1 answer

Read CSV file into Polars dataframe with Rust

I would like to read the CSV file into a Polars dataframe. Copying the code from the official documentation fails to run with cargo. use polars::prelude::*; use std::fs::File; fn example() -> Result { let file =…
David
  • 3
  • 4
0
votes
0 answers

apply function to column out-of-memory in Python Polars

I have a large GIS dataset (167x25e6) that was generated from GeoJSON, via .csv to now parquet. This is my first time that I really have to deal with out-of-memory dataframes and I am still trying to find out if Polars is the right option for my…