Highest Voted 'python-polars' Questions

2

votes

2 answers

Make a categorical column which has categories ['a', 'b', 'c'] in Polars

How do I make a Categorical column which has: elements: ['a', 'b', 'a', 'a'] categories ['a', 'b', 'c'] in polars? In pandas, I would do: In [31]: pd.Series(pd.Categorical(['a', 'b', 'a', 'a'], categories=['a', 'b', 'c'])) Out[31]: 0 a 1 b 2…

python python-polars

asked Jul 04 '23 at 14:16

ignoring_gravity

6,677
4
32
65

2

votes

1 answer

How to create a polars column listing duplicates of another column

I have hard a hard time searching for the answer to this as I find it hard to put into words. I'm trying to aggregate multiple listings of files on disks, some of which have the same files. I want only one row for a given file, and a separate column…

python-polars

asked Jun 23 '23 at 23:09

RandyP

497
3
11

2

votes

2 answers

How to get current index of element in polars list

When evaluating list elements I would like to know and use the current index. Is there already a way of doing it? Something like pl.element().idx() ? import polars as pl data = {"a": [[1,2,3],[4,5,6]]} schema = {"a": pl.List(pl.Int8)} df =…

python python-polars

asked Jun 21 '23 at 12:22

wKollendorf

47
4

2

votes

1 answer

How to create a new column based on the common start word between two series in a Polars DataFrame?

I have a Polars DataFrame consisting of two series, 'foo' and 'bar', which contain lists of integers. I want to create a new column that assigns a value of 1 if the start word (first element) of the 'foo' series is equal to the start word of the…

pandas dataframe python-polars rust-polars

asked Jun 18 '23 at 10:29

tikendraw

451
3
12

2

votes

1 answer

Splitting a lazyframe into two frames by fraction of rows to make a train-test split

I have a train_test_split function in Polars that can handle an eager DataFrame. I wish to write an equivalent function that can take a LazyFrame as input and return two LazyFrames without evaluating them. My function is as follows. It shuffles all…

python-polars

asked Jun 18 '23 at 09:30

TomNorway

2,584
1
19
26

2

votes

2 answers

How to remove last N chars from a string column in python-polars?

Given this dataframe: df = pl.DataFrame({"s": ["pear", None, "papaya", "dragonfruit"]}) I want to remove the last X chars, e.g. remove the last 2 chars from the column. This obviously doesn't do what I want: df.with_columns( …

python python-polars

asked Jun 16 '23 at 13:25

nos

223,662
58
417
506

2

votes

2 answers

Polars - Count unique values over a time period

I'm migrating a pipeline from pandas to polars, the data is for arrivals and departures of trucks docked in a warehouse, in a certain step of the pipeline I need to calculate the number of trucks that are docked at any given time, that is, for every…

python pandas python-polars

asked Jun 16 '23 at 08:26

JuanPy

41
5

2

votes

2 answers

Polars is much slower than DuckDB in conditional join + groupby/agg context

For the following example, where it involves a self conditional join and a subsequent groupby/aggregate operation. It turned out that in such case, DuckDB gives much better performance than Polars (~10x on a 32-core machine). My questions are: What…

python python-polars duckdb

asked Jun 15 '23 at 03:33

lebesgue

837
4
13

2

votes

2 answers

Create a new column with the first value that matches a condition

I have a Dataframe similar to this: import polars as pl df = pl.DataFrame({ 'Time': [1, 2, 3, 4, 5, 6, 7, 8, 9], 'Value': [100, 75, 70, 105, 140, 220, 65, 180, 150] }) Represented here: | Time | Value | | 1 | 100 | | 2 |…

python dataframe python-polars

asked Jun 13 '23 at 22:08

Jona Rodrigues

992
1
11
23

2

votes

1 answer

How do I transform multiple columns simultaneously in polars dataframe?

I have two dataframes, one of them is just a single row, and I would like to transform each of the columns in the first one with the values in the single row in some fashion. How do I do this? Here's what I want to achieve: df1 = pl.DataFrame({'c1':…

python-polars rust-polars

asked Jun 12 '23 at 14:58

ste_kwr

820
1
5
21

2

votes

1 answer

How to create a frequency table in polars from an iterator

I am trying to create a polars dataframe which is a frequency table of words in a list of words. Something like this: from collections import defaultdict word_freq= defaultdict(int) for word in list_of_words: word_freq[word] += 1 Except,…

python-polars

asked Jun 10 '23 at 15:07

ste_kwr

820
1
5
21

2

votes

1 answer

How do I do a train and test split in a polars dataframe

I am trying to find a simple way of randomly splitting a polars dataframe in train and test. This is how I am doing it right now train, test = df .with_columns(pl.lit(np.random.rand(df0.height)>0.8).alias('split')) …

python-polars

asked Jun 09 '23 at 19:15

ste_kwr

820
1
5
21

2

votes

0 answers

How to scan partitioned parquet file from gcs into polars?

I am trying to scan a folder of multiple parquet file into a polars dataframe. On this question the following is given as an answer using s3. from pyarrow.dataset import dataset import gcsfs import polars as pl # setup cloud filesystem…

python google-cloud-storage python-polars

asked Jun 07 '23 at 12:31

EricLeer

41
4

2

votes

1 answer

Serializing Polars expressions as JSON or YAML file?

I am extremely happy with the polars expression syntax, so much so that a lot of my feature engineering is expressed in polars expressions. However, I am now trying to move the feature engineering to JSON or YAML files (for MLOps reasons). The…

python-polars

asked Jun 05 '23 at 10:31

MYK

1,988
7
30

2

votes

1 answer

Polars convert string of digits to list

So i have a polars column/series that is strings of digits. s = pl.Series("a", ["111","123","101"]) s shape: (3,) Series: 'a' [str] [ "111" "123" "101" ] I would like to convert each string into a list of integers. I have found a…

python python-polars

asked May 28 '23 at 14:20

J.N.

153
1
9

Questions tagged [python-polars]

Links

Make a categorical column which has categories ['a', 'b', 'c'] in Polars

How to create a polars column listing duplicates of another column

How to get current index of element in polars list

How to create a new column based on the common start word between two series in a Polars DataFrame?

Splitting a lazyframe into two frames by fraction of rows to make a train-test split

How to remove last N chars from a string column in python-polars?

Polars - Count unique values over a time period

Polars is much slower than DuckDB in conditional join + groupby/agg context

Create a new column with the first value that matches a condition

How do I transform multiple columns simultaneously in polars dataframe?

How to create a frequency table in polars from an iterator

How do I do a train and test split in a polars dataframe

How to scan partitioned parquet file from gcs into polars?

Serializing Polars expressions as JSON or YAML file?

Polars convert string of digits to list