Questions tagged [python-polars]

Polars is a DataFrame library/in-memory query engine.

The Polars core library is written in Rust and uses Arrow, the native arrow2 Rust implementation, as its foundation. It offers Python and JavaScript bindings, which serve as a wrapper for functionality implemented in the core library.

Links

1331 questions
8
votes
1 answer

How to properly display a Polars dataframe in VSCode Jupyter Notebook variables inspector

Edit: This has been filed as bug in the Polars repository: https://github.com/pola-rs/polars/issues/6152 And the VSCode Jupyter repo: https://github.com/microsoft/vscode-jupyter/issues/12519 I am testing Python-Polars inside a Jupyter notebook in…
Raphael
  • 810
  • 6
  • 18
8
votes
4 answers

How to transform Spark dataframe to Polars dataframe?

I wonder how i can transform Spark dataframe to Polars dataframe. Let's say i have this code on PySpark: df = spark.sql('''select * from tmp''') I can easily transform it to pandas dataframe using .toPandas. Is there something similar in polars, as…
s1nbad
  • 83
  • 1
  • 4
8
votes
2 answers

Polars: Specify dtypes for all columns at once in read_csv

In Polars, how can one specify a single dtype for all columns in read_csv? According to the docs, the dtypes argument to read_csv can take either a mapping (dict) in the form of {'column_name': dtype}, or a list of dtypes, one for each…
daviewales
  • 2,144
  • 21
  • 30
7
votes
3 answers

Sample from each group in polars dataframe?

I'm looking for a function along the lines of df.groupby('column').agg(sample(10)) so that I can take ten or so randomly-selected elements from each group. This is specifically so I can read in a LazyFrame and work with a small sample of each group…
user6268172
7
votes
1 answer

How to write polars custom apply function that does the processing row by row?

I need to create a new column in my dataframe that stores the processed values. So I used polars apply function to do some processing of dicoms and then return value. But this apply function by default takes the entire column as polars Series and it…
Pradeepgb
  • 71
  • 1
  • 4
7
votes
2 answers

Error while converting pandas dataframe to polars dataframe (pyarrow.lib.ArrowTypeError: Expected bytes, got a 'int' object)

I am converting pandas dataframe to polars dataframe but pyarrow throws error. My code: import polars as pl import pandas as pd if __name__ == "__main__": with open(r"test.xlsx", "rb") as f: excelfile = f.read() excelfile =…
Rahil
  • 183
  • 1
  • 2
  • 11
6
votes
1 answer

How to convert time durations to numeric in polars?

Is there any built-in function in polars or a better way to convert time durations to numeric by defining the time resolution (e.g.: days, hours, minutes)? # Create a dataframe df = pl.DataFrame( { "from": ["2023-01-01", "2023-01-02",…
Guz
  • 387
  • 3
  • 21
6
votes
2 answers

Polars Looping through the rows in a dataset

I am trying to loop through a Polars recordset using the following code: import polars as pl mydf = pl.DataFrame( {"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"], "Name": ["John", "Joe", "James"]}) print(mydf) │start_date ┆…
John Smith
  • 2,448
  • 7
  • 54
  • 78
6
votes
0 answers

Issue while using py-polars sink_parquet method on a LazyFrame

I am getting the below error while using sink_parquet on a LazyFrame. Earlier I was using .collect() on the output of the scan_parquet() to convert the result into a DataFrame but unfortunately it is not working with larger than RAM datasets. Here…
Niladri
  • 5,832
  • 2
  • 23
  • 41
6
votes
1 answer

polars slower than numpy?

I was thinking about using polars in place of numpy in a parsing problem where I turn a structured text file into a character table and operate on different columns. However, it seems that polars is about 5 times slower than numpy in most operations…
Qunatized
  • 197
  • 1
  • 9
6
votes
2 answers

How to do regression (simple linear for example) in polars select or groupby context?

I am using polars in place of pandas. I am quite amazed by the speed and lazy computation/evaluation. Right now, there are a lot of methods on lazy dataframe, but they can only drive me so far. So, I am wondering what is the best way to use polars…
lebesgue
  • 837
  • 4
  • 13
6
votes
3 answers

Mapping a Python dict to a Polars series

In Pandas we can use the map function to map a dict to a series to create another series with the mapped values. More generally speaking, I believe it invokes the index operator of the argument, i.e. []. import pandas as pd dic = { 1: 'a', 2: 'b',…
T.H Rice
  • 117
  • 8
6
votes
2 answers

How to use polars dataframes with scikit-learn?

I'm unable to use polars dataframes with scikitlearn for ML training. Currently I'm doing all the dataframe preprocessing in polars and during model training i'm converting it into a pandas one in order for it to work. Is there any method to…
RKCH
  • 219
  • 3
  • 9
6
votes
5 answers

python-polars split string column into many columns by delimiter

In pandas, the following code will split the string from col1 into many columns. is there a way to do this in polars? d = {'col1': ["a/b/c/d", "a/b/c/d"]} df= pd.DataFrame(data=d) df[["a","b","c","d"]]=df["col1"].str.split('/',expand=True)
tommyt
  • 309
  • 5
  • 15
6
votes
1 answer

Polars - Replace part of string in column with value of other column

So I have a Polars dataframe looking as such df = pl.DataFrame( { "ItemId": [15148, 15148, 24957], "SuffixFactor": [19200, 200, 24], "ItemRand": [254, -1, -44], "Stat0": ['+5 Defense', '+$i Might', '+9…
Shamatix
  • 77
  • 1
  • 6
1
2
3
88 89