Questions tagged [python-polars]

Polars is a DataFrame library/in-memory query engine.

The Polars core library is written in Rust and uses Arrow, the native arrow2 Rust implementation, as its foundation. It offers Python and JavaScript bindings, which serve as a wrapper for functionality implemented in the core library.

Links

1331 questions
2
votes
2 answers

Make a categorical column which has categories ['a', 'b', 'c'] in Polars

How do I make a Categorical column which has: elements: ['a', 'b', 'a', 'a'] categories ['a', 'b', 'c'] in polars? In pandas, I would do: In [31]: pd.Series(pd.Categorical(['a', 'b', 'a', 'a'], categories=['a', 'b', 'c'])) Out[31]: 0 a 1 b 2…
ignoring_gravity
  • 6,677
  • 4
  • 32
  • 65
2
votes
1 answer

How to create a polars column listing duplicates of another column

I have hard a hard time searching for the answer to this as I find it hard to put into words. I'm trying to aggregate multiple listings of files on disks, some of which have the same files. I want only one row for a given file, and a separate column…
RandyP
  • 497
  • 3
  • 11
2
votes
2 answers

How to get current index of element in polars list

When evaluating list elements I would like to know and use the current index. Is there already a way of doing it? Something like pl.element().idx() ? import polars as pl data = {"a": [[1,2,3],[4,5,6]]} schema = {"a": pl.List(pl.Int8)} df =…
2
votes
1 answer

How to create a new column based on the common start word between two series in a Polars DataFrame?

I have a Polars DataFrame consisting of two series, 'foo' and 'bar', which contain lists of integers. I want to create a new column that assigns a value of 1 if the start word (first element) of the 'foo' series is equal to the start word of the…
tikendraw
  • 451
  • 3
  • 12
2
votes
1 answer

Splitting a lazyframe into two frames by fraction of rows to make a train-test split

I have a train_test_split function in Polars that can handle an eager DataFrame. I wish to write an equivalent function that can take a LazyFrame as input and return two LazyFrames without evaluating them. My function is as follows. It shuffles all…
TomNorway
  • 2,584
  • 1
  • 19
  • 26
2
votes
2 answers

How to remove last N chars from a string column in python-polars?

Given this dataframe: df = pl.DataFrame({"s": ["pear", None, "papaya", "dragonfruit"]}) I want to remove the last X chars, e.g. remove the last 2 chars from the column. This obviously doesn't do what I want: df.with_columns( …
nos
  • 223,662
  • 58
  • 417
  • 506
2
votes
2 answers

Polars - Count unique values over a time period

I'm migrating a pipeline from pandas to polars, the data is for arrivals and departures of trucks docked in a warehouse, in a certain step of the pipeline I need to calculate the number of trucks that are docked at any given time, that is, for every…
JuanPy
  • 41
  • 5
2
votes
2 answers

Polars is much slower than DuckDB in conditional join + groupby/agg context

For the following example, where it involves a self conditional join and a subsequent groupby/aggregate operation. It turned out that in such case, DuckDB gives much better performance than Polars (~10x on a 32-core machine). My questions are: What…
lebesgue
  • 837
  • 4
  • 13
2
votes
2 answers

Create a new column with the first value that matches a condition

I have a Dataframe similar to this: import polars as pl df = pl.DataFrame({ 'Time': [1, 2, 3, 4, 5, 6, 7, 8, 9], 'Value': [100, 75, 70, 105, 140, 220, 65, 180, 150] }) Represented here: | Time | Value | | 1 | 100 | | 2 |…
Jona Rodrigues
  • 992
  • 1
  • 11
  • 23
2
votes
1 answer

How do I transform multiple columns simultaneously in polars dataframe?

I have two dataframes, one of them is just a single row, and I would like to transform each of the columns in the first one with the values in the single row in some fashion. How do I do this? Here's what I want to achieve: df1 = pl.DataFrame({'c1':…
ste_kwr
  • 820
  • 1
  • 5
  • 21
2
votes
1 answer

How to create a frequency table in polars from an iterator

I am trying to create a polars dataframe which is a frequency table of words in a list of words. Something like this: from collections import defaultdict word_freq= defaultdict(int) for word in list_of_words: word_freq[word] += 1 Except,…
ste_kwr
  • 820
  • 1
  • 5
  • 21
2
votes
1 answer

How do I do a train and test split in a polars dataframe

I am trying to find a simple way of randomly splitting a polars dataframe in train and test. This is how I am doing it right now train, test = df .with_columns(pl.lit(np.random.rand(df0.height)>0.8).alias('split')) …
ste_kwr
  • 820
  • 1
  • 5
  • 21
2
votes
0 answers

How to scan partitioned parquet file from gcs into polars?

I am trying to scan a folder of multiple parquet file into a polars dataframe. On this question the following is given as an answer using s3. from pyarrow.dataset import dataset import gcsfs import polars as pl # setup cloud filesystem…
EricLeer
  • 41
  • 4
2
votes
1 answer

Serializing Polars expressions as JSON or YAML file?

I am extremely happy with the polars expression syntax, so much so that a lot of my feature engineering is expressed in polars expressions. However, I am now trying to move the feature engineering to JSON or YAML files (for MLOps reasons). The…
MYK
  • 1,988
  • 7
  • 30
2
votes
1 answer

Polars convert string of digits to list

So i have a polars column/series that is strings of digits. s = pl.Series("a", ["111","123","101"]) s shape: (3,) Series: 'a' [str] [ "111" "123" "101" ] I would like to convert each string into a list of integers. I have found a…
J.N.
  • 153
  • 1
  • 9