Questions tagged [python-polars]

Polars is a DataFrame library/in-memory query engine.

The Polars core library is written in Rust and uses Arrow, the native arrow2 Rust implementation, as its foundation. It offers Python and JavaScript bindings, which serve as a wrapper for functionality implemented in the core library.

Links

1331 questions
0
votes
2 answers

Python Polars: Read Column as Datetime

How does one read a csv into a polar DataFrame and parse one of the columns as a datetime? Alternatively, how does one convert a column to a pl.datetime?
Test
  • 962
  • 9
  • 26
0
votes
3 answers

Broadcast in agg when needed

In Polars, the select and with_column methods broadcast any scalars that they get, including literals: import polars as pl df.with_column(pl.lit(1).alias("y")) # shape: (3, 2) # ┌─────┬─────┐ # │ x ┆ y │ # │ --- ┆ --- │ # │ i64 ┆ i64 │ #…
drhagen
  • 8,331
  • 8
  • 53
  • 82
0
votes
1 answer

Python Polars Read Zipped CSV

How does one read a zipped csv file into a python polars DataFrame? The only current solution is writing the entire thing into memory and then passing it into pl.read_csv.
Test
  • 962
  • 9
  • 26
0
votes
1 answer

Filtering on a large number (hundreds) of conditions

I have a largish dataframe (5.5M rows, four columns). The first column (let's call it column A) has 235 distinct entries. The second column (B) has 100 distinct entries, integers from 0 to 99, all present in various proportions for each entry in…
Aubergine
  • 368
  • 3
  • 12
0
votes
2 answers

How to filter record "sequences" from a Polars dataframe using multiple threads?

I have a data set with multiple records on each individual - one record for each time period. Where an individual is missing a record for a time period, I need to remove any later records for that individual. So given an example dataset like…
Tikkanz
  • 2,398
  • 15
  • 21
0
votes
1 answer

Python Polars regex - remove non english, keep numbers punctuations and emojis

I have python code for the task. import re import string emoji_pat = '[\U0001F300-\U0001F64F\U0001F680-\U0001F6FF\u2600-\u26FF\u2700-\u27BF]' shrink_whitespace_reg = re.compile(r'\s{2,}') def clean_text(raw_text): reg =…
MPA
  • 1,011
  • 7
  • 22
0
votes
1 answer

How to cast a column with data type List[null] to List[i64] in polars

Hey I have the following problem, I'd like to use the polars apply function on columns with the datatype List. In most cases this works, but in some cases all lists in the column are empty and the column datatype is List[null], in that special case…
seb2704
  • 390
  • 1
  • 5
  • 17
0
votes
0 answers

polars: cannot connect to postgresql using pl.read_sql

according the doc in [url:https://pola-rs.github.io/polars-book/user-guide/howcani/io/read_db.html] import polars as pl conn = "postgres://username:password@server:port/database" query = "SELECT * FROM foo" pl.read_sql(query,…
Hengaini
  • 44
  • 5
0
votes
1 answer

Supply a literal array to a Polar expression

How can I define a literal array in a Polars expression? For example, if I wanted to filter if an expression was true and a given value in a mask was true. import polars as pl df = pl.DataFrame(dict(x=[1,2,3,4,5,6])) mask = [True, True, False,…
drhagen
  • 8,331
  • 8
  • 53
  • 82
0
votes
1 answer

Take elements from each group in Polars

How can I take elements by index within each group of a Polars DataFrame? For example, if I wanted to get the first and third element of each group, I might try something like this: import polars as pl df = pl.DataFrame(dict(x=[1,0,1,0,1,0],…
drhagen
  • 8,331
  • 8
  • 53
  • 82
0
votes
1 answer

how to cast a i64 Series/chunkedArray into f64 or String in polars-rust?

With a Series which comes from a column of DataFrame, how could cast it into a f64 Series or f64 ChunkedArray? It seems .apply_cast_numeric(|v| v as f64) got failure.
Hakase
  • 211
  • 1
  • 12
0
votes
2 answers

Lazy filter depending on the previous line (Polars Python)

I'm using Python Polars and I have a table like this : Column1 Column2 id1 1 id1 1 id1 2 id1 1 id1 1 id1 2 id1 3 I would like, using Polars Lazy API, to have the result when column2 previous element is different from the…
Zorp
  • 75
  • 5
0
votes
1 answer

Cluster rows with same values without sorting

Sorting by particular columns brings together all rows with the same tuple under those columns. I want to cluster all rows with the same value, but keep the groups in the same order in which their first member appeared. Something like this: import…
drhagen
  • 8,331
  • 8
  • 53
  • 82
0
votes
1 answer

Polar converters like pandas

Pandas read_csv accepts converters to pre-process each field. This is very useful especially for int64 validation or mixed dateformats etc. Could you please provide a way to read multiple columns as pl.Utf8 and then cast as Int64, Float64, Date etc…
0
votes
1 answer

is there any simliar function of idxmax() in py-polars in groupby?

import polars as pl import pandas as pd A = ['a','a','a','a','a','a','a','b','b','b','b','b','b','b'] B = [1,2,3,4,5,6,7,8,9,10,11,12,13,14] df = pl.DataFrame({'cola':A, 'colb':B}) df_pd = df.to_pandas() index =…
thunderwu
  • 33
  • 5