Questions tagged [python-polars]

Polars is a DataFrame library/in-memory query engine.

The Polars core library is written in Rust and uses Arrow, the native arrow2 Rust implementation, as its foundation. It offers Python and JavaScript bindings, which serve as a wrapper for functionality implemented in the core library.

Links

1331 questions
4
votes
1 answer

Idiomatic replacement of empty string '' with pl.Null (null) in polars

I have a polars DataFrame with a number of Series that look like: pl.Series(['cow', 'cat', '', 'lobster', '']) and I'd like them to be pl.Series(['cow', 'cat', pl.Null, 'lobster', pl.Null]) A simple string replacement won't work since pl.Null is…
user6268172
4
votes
2 answers

Compare two polars DataFrames for equality

How do I compare two polars DataFrames for value equality? It appears that == is only true if the two tables are the same object: import polars as pl pl.DataFrame({"x": [1,2,3]}) == pl.DataFrame({"x": [1,2,3]}) # False
drhagen
  • 8,331
  • 8
  • 53
  • 82
4
votes
1 answer

Joining dataframes using rust polars in Python

I am experimenting with polars and would like to understand why using polars is slower than using pandas on a particular example: import pandas as pd import polars as pl n=10_000_000 df1 = pd.DataFrame(range(n), columns=['a']) df2 =…
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
4
votes
2 answers

How to select rows between a certain date range in python-polars?

If a DataFrame is constructed like the following using polars-python: import polars as pl from polars import col from datetime import datetime df = pl.DataFrame({ "dates": ["2016-07-02", "2016-08-10", "2016-08-31", "2016-09-10"], "values":…
pythonic833
  • 3,054
  • 1
  • 12
  • 27
3
votes
2 answers

How can I reduce the amount of data in a polars DataFrame?

I have a csv file with a size of 28 GB, which I want to plot. Those are way too many data points obviously, so how can I reduce the data? I would like to merge about 1000 data points into one by calculating the mean. This is the sturcture of my…
Jan
  • 157
  • 9
3
votes
1 answer

Polars expression when().then().otherwise is slow

I noticed a thing in python polars. I’m not sure but seems that pl.when().then().otherwise() is slow somewhere. For instance, for dataframe: df = pl.DataFrame({ 'A': [randint(1, 10**15) for _ in range(30_000_000)], 'B': [randint(1, 10**15)…
s-b90
  • 31
  • 4
3
votes
1 answer

Pandas.eval replacement in polars

Suppose I have an expression like "col3 = col2 + col1" so pandas we can directly call pandas.dataframe.eval() but in polars i cannot find such method. I have series.eval in polars but no luck as i want evaluate user given expression on a dataframe.
3
votes
2 answers

DST temporal feature from timestamp using polars

I'm migrating code to polars from pandas. I have time-series data consisting of a timestamp and value column and I need to compute a bunch of features. i.e. df = pl.DataFrame({ "timestamp": pl.date_range( datetime(2017, 1, 1), …
David Waterworth
  • 2,214
  • 1
  • 21
  • 41
3
votes
1 answer

Polars memory usage as compared to {data.table}

Fairly new to python-polars. How does it compare to Rs {data.table} package in terms of memory usage? How does it handle shallow copying? Is in-place/by reference updating possible/the default? Are there any recent benchmarks on memory efficiency of…
persephone
  • 380
  • 2
  • 10
3
votes
2 answers

How do I fill in missing factors in a polars dataframe?

I have this dataframe: testdf = pl.DataFrame({'date':['date1','date1','date1','date2','date3','date3'], 'factor':['A','B','C','B','B','C'], 'val':[1,2,3,3,1,5]}) Some of the factors are missing. I'd like to fill in the gaps with values 0. This is…
ste_kwr
  • 820
  • 1
  • 5
  • 21
3
votes
2 answers

In a Polars groupby aggregation, how do you concatenate string values in each group?

When grouping a Polars dataframe in Python, how do you concatenate string values from a single column across rows within each group? For example, given the following DataFrame: import polars as pl df = pl.DataFrame( { "col1": ["a", "b",…
3
votes
1 answer

How to create a rank on two columns in python polars?

Suppose we have this dataframe in polars (python): import polars as pl df = pl.DataFrame( { "era": ["01", "01", "02", "02", "03", "03"], "pred": [3,5,6,8,9,1] } ) I can create a rank/row_number based on one column,…
lmocsi
  • 550
  • 2
  • 17
3
votes
2 answers

Column- and row-wise logical operations on Polars DataFrame

In Pandas, one can perform boolean operations on boolean DataFrames with the all and any methods, providing an axis argument. For example: import pandas as pd data = dict(A=["a","b","?"], B=["d","?","f"]) pd_df = pd.DataFrame(data) For example, to…
Qunatized
  • 197
  • 1
  • 9
3
votes
0 answers

What is the difference between polars.collect_all and polars.LazyFrame.collect

Starting with the example below: import time import numpy as np import polars as pl n_index = 1000 n_a = 10 n_b = 500 n_obs = 5000000 df = pl.DataFrame( { "id": np.random.randint(0, n_index, size=n_obs), "a":…
lebesgue
  • 837
  • 4
  • 13
3
votes
1 answer

Concise way to retrieve a row from a Polars DataFrame with an iterator of column-value pairs

I often need to retrieve a row from a Polars DataFrame given a collection of column values, like I might use a composite key in a database. This is possible in Polars using DataFrame.row, but the resulting expression is very verbose: row_index =…
Etherian
  • 119
  • 2
  • 5