Questions tagged [python-polars]

Polars is a DataFrame library/in-memory query engine.

The Polars core library is written in Rust and uses Arrow, the native arrow2 Rust implementation, as its foundation. It offers Python and JavaScript bindings, which serve as a wrapper for functionality implemented in the core library.

Links

1331 questions
3
votes
1 answer

python-polars casting string to numeric

When applying pandas.to_numeric,Pandas return dtype is float64 or int64 depending on the data supplied.https://pandas.pydata.org/docs/reference/api/pandas.to_numeric.html is there an equivelent to do this in polars? I have seen this How to cast a…
tommyt
  • 309
  • 5
  • 15
3
votes
1 answer

Is there a Pandas Profiling like implemention built on polars?

We use Pandas and Pandas Profiling extensively in our projects to generate profile reports. We were going to explore using Polars as a Pandas alternative and wanted to check if there were any implementations like Pandas Profiling built on top of…
Shashi Deshetti
  • 1,354
  • 2
  • 11
  • 22
3
votes
2 answers

how to use apply functions with multiple parameters?

Now I have a dataframe: df = pd.DataFrame({"a":[1,2,3,4,5],"b":[2,3,4,5,6],"c":[3,4,5,6,7]}) the fuction: def fun(a,b,shift_len): return a+b*shift_len,b-shift_len I can get the result by: df[["d","e"]] = df.apply(lambda…
3
votes
1 answer

How to transform a series of a Polars dataframe?

I am dealing with a large dataframe (198,619 rows x 19,110 columns) and so am using the polars package to read in the tsv file. Pandas just takes too long. However, I now face an issue as I want to transform each cell's value x raising it by base 2…
nattzy
  • 87
  • 1
  • 6
3
votes
2 answers

Select all columns where column name starts with string

Given the following dataframe, is there some way to select only columns starting with a given prefix? I know I could do e.g. pl.col(column) for column in df.columns if column.startswith("prefix_"), but I'm wondering if I can do it as part of a…
TomNorway
  • 2,584
  • 1
  • 19
  • 26
3
votes
1 answer

How to mask a polars dataframe using another dataframe?

I have a polars dataframe like so: pl.DataFrame({ 'time': [datetime(2021, 10, 2, 0, 5), datetime(2021, 10, 2, 0, 10)], '1': [2.9048, 48224.0], '2': [2.8849, 48068.0] }) and a masking dataframe with similar columns and time value like…
clem
  • 35
  • 5
3
votes
2 answers

Add timedelta to a date column above weeks

How would I add 1 year to a column? I've tried using map and apply but I failed miserably. I also wonder why pl.date() accepts integers while it advertises that it only accepts str or pli.Expr. A small hack workaround is: col = pl.col('date').dt df…
supersick
  • 261
  • 2
  • 14
3
votes
1 answer

Why is polars called the fastest dataframe library, isn't dask with cudf more powerfull?

Most of the benchmarks have dask and cuDF isolated, but i can use them together. Wouldn't Dask with cuDF be faster than polars?! Also, Polars only runs if the data fits in memory, but this isn't the case with dask. So why is there…
zacko
  • 179
  • 2
  • 9
3
votes
1 answer

How to fill n random rows after filtering in polars

I'm thinking for over a few hours how to fill n rows after filtering in polars with some value. To give you an example, I'd like to do the following operation in polars. Given a dataframe with column a that have 1s and 2s, we want to create column b…
3
votes
1 answer

How to use polars.concat_str to combine multiple columns selected by regex?

I have a problem to merge columns into one. Say I have a dataframe (df) like below: >> print(df) shape: (3, 4) ┌─────┬───────┬───────┬───────┐ │ a ┆ b_a_1 ┆ b_a_2 ┆ b_a_3 │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ str ┆ str …
Thi An
  • 47
  • 1
  • 5
3
votes
2 answers

Polars: add the sum of some columns inside select/with_column call

I would like to add a column that is the sum of all columns but some id columns with polars. This can be done using polars.DataFrame.sum(axis=1): import polars as pl df = pl.DataFrame( { "id": [1, 2], "cat_a": [2, 7], …
datenzauber.ai
  • 379
  • 2
  • 11
3
votes
1 answer

What is the exact meaning of `pl.col("")` expression with empty string argument

The example in a section about 'list context' in the polars-book uses pl.col("") expression with an empty string "" as the argument. # the percentage rank expression rank_pct = pl.col("").rank(reverse=True) / pl.col("").count() From the context and…
3
votes
1 answer

What's the equivalent of `pandas.Series.map(json.loads)` in polars?

Based on the document of polars, one can use json_path_match to extract JSON fields into string series. But can we do something like pandas.Series.map(json.loads) to convert the whole JSON string at once? One can then further convert the loaded JSON…
Saddle Point
  • 3,074
  • 4
  • 23
  • 33
3
votes
1 answer

Convert Pandas pivot_table function into Polars pivot Function

I'm trying to convert some python pandas into polars. I'm stuck trying to convert pandas pivot_table function into polars. The following is the working pandas code. I can't seem to get the same behavior with the Polars pivot function. The polars…
Butter_
  • 43
  • 1
  • 4
3
votes
1 answer

How can I share a lazy dataframe between different runtimes?

I have a desktop application where the majority of calculations (>90%) happen on the Rust side of it. But I want the user to be able to write scripts in Python that will operate on the df. Can this be done without serializing the dataframe between…
mainrs
  • 43
  • 3