Questions tagged [python-polars]

Polars is a DataFrame library/in-memory query engine.

The Polars core library is written in Rust and uses Arrow, the native arrow2 Rust implementation, as its foundation. It offers Python and JavaScript bindings, which serve as a wrapper for functionality implemented in the core library.

Links

1331 questions
5
votes
2 answers

Make a constant column in Polars

In Polars 0.13.14, I could create a DataFrame with an all-constant column like this: import polars as pl pl.DataFrame(dict(x=pl.repeat(1, 3))) # shape: (3, 1) # ┌─────┐ # │ x │ # │ --- │ # │ i64 │ # ╞═════╡ # │ 1 │ # ├╌╌╌╌╌┤ # │ 1 │ #…
drhagen
  • 8,331
  • 8
  • 53
  • 82
5
votes
1 answer

Is there an Apache Arrow equivalent of the Spark Pandas UDF

Spark provides a few different ways to implement UDFs that consume and return Pandas DataFrames. I am currently using the cogrouped version that takes two (co-grouped) Pandas DataFrames as input and returns a third. For efficient translation between…
5
votes
3 answers

Polars: How to reorder columns in a specific order?

I cannot find how to reorder columns in a polars dataframe in the polars DataFrame docs. thx
rchitect-of-info
  • 1,150
  • 1
  • 11
  • 23
5
votes
3 answers

Polars: Search and replace in column names

This used to be handled in pandas as so: df.columns = df.columns.str.replace('.','_') This code works but definitely doesn't feel like the correct solution. renamed = {} for column_name in list(filter(lambda x: '.' in x, df.columns)): …
rchitect-of-info
  • 1,150
  • 1
  • 11
  • 23
5
votes
1 answer

What is the Polars equivalent of Pandas `.isna()` method?

I'm trying to replace Pandas with Polars in production code, for better memory performance. What would be the Polars equivalent of Pandas .isna() method? I couldn't find any good equivalent in the doc.
bolino
  • 867
  • 1
  • 10
  • 27
4
votes
2 answers

Alternatives for long .when().then().when().then().otherwise() chains

Are there some clever alternatives writing long when().then().otherwise() chains without hardcoding the values, see the example below: Let's say we have the following dataframe df = pl.DataFrame( { "Market":["AT", "AT", "DE", "DE", "CA",…
miroslaavi
  • 361
  • 2
  • 7
4
votes
1 answer

How to do if and else in Polars groupby context

For a dataframe, the goal is to have the mean of a column - a groupby another column - b given the first value of a in the group is not null, if it is, just return null. The sample dataframe df = pl.DataFrame({"a": [None, 1, 2, 3, 4], "b": [1, 1, 2,…
lebesgue
  • 837
  • 4
  • 13
4
votes
1 answer

how to do a qcut by group in polars?

Consider the following example zz = pl.DataFrame({'group' : ['a','a','a','a','b','b','b'], 'col' : [1,2,3,4,1,3,2]}) zz Out[16]: shape: (7, 2) ┌───────┬─────┐ │ group ┆ col │ │ --- ┆ --- │ │ str ┆ i64 │ ╞═══════╪═════╡ │ a ┆…
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
4
votes
1 answer

filter polars DataFrame based on when rows whose specific columns contain pairs from a list of pairs

In this example, on columns ["foo", "ham"], I want rows 1 and 4 to be removed since they match a pair in the list df = pl.DataFrame( { "foo": [1, 1, 2, 2, 3, 3, 4], "bar": [6, 7, 8, 9, 10, 11, 12], "ham": ["a", "b", "c",…
pikaft
  • 43
  • 5
4
votes
2 answers

Polars vs. Pandas: size and speed difference

I have a parquet file (~1.5 GB) which I want to process with polars. The resulting dataframe has 250k rows and 10 columns. One column has large chunks of texts in it. I have just started using polars, because I heard many good things about it. One…
FredMaster
  • 1,211
  • 1
  • 15
  • 35
4
votes
1 answer

Count consecutive True (or 1) values in a Boolean (or numeric) column with Polars?

I am hoping to count consecutive values in a column, preferably using Polars expressions. import polars df = pl.DataFrame( {"values": [True,True,True,False,False,True,False,False,True,True]} ) With the example data frame above, I would like to…
JGrant06
  • 53
  • 4
4
votes
1 answer

Polars table convert a list column to separate rows i.e. unnest a list column to multiple rows

I have a Polars dataframe in the form: df = pl.DataFrame({'a':[1,2,3], 'b':[['a','b'],['a'],['c','d']]}) ┌─────┬────────────┐ │ a ┆ b │ │ --- ┆ --- │ │ i64 ┆ list[str] │ ╞═════╪════════════╡ │ 1 ┆ ["a", "b"] │ │ 2 ┆ ["a"] …
kristianp
  • 5,496
  • 37
  • 56
4
votes
2 answers

Float Decimal Point Display Setting in Polars

Is there a way to adjust or change a setting that Polars would show a same number of decimal points for all values? And if it is, am I able to save it as default for all new notebooks in Jupyter for instance? For example, pl.DataFrame({"a":[0.1213,…
miroslaavi
  • 361
  • 2
  • 7
4
votes
2 answers

Performing integer-based rolling window grouby using Python Polars

I have a outer/inner loop-based function I'm trying to vectorise using Python Polars DataFrames. The function is a type of moving average and will be used to filter time-series financial data. Here's the function: def ma_j(df_src: pl.DataFrame,…
Paul
  • 135
  • 1
  • 9
4
votes
3 answers

Connect python-polars to SQL server (no support currently)

How can I directly connect MS SQL Server to polars? The documentation does not list any supported connections but recommends the use of pandas. Update: SQL Server Authentication works per answer, but Windows domain authentication is not working. see…
Isaacnfairplay
  • 217
  • 2
  • 18