Questions tagged [python-polars]

Polars is a DataFrame library/in-memory query engine.

The Polars core library is written in Rust and uses Arrow, the native arrow2 Rust implementation, as its foundation. It offers Python and JavaScript bindings, which serve as a wrapper for functionality implemented in the core library.

Links

1331 questions
0
votes
1 answer

Polars string column to pl.datetime in Polars: conversion issue

working with a csv file with the following schema 'Ticket ID': polars.datatypes.Int64, .. 'Created time': polars.datatypes.Utf8, 'Due by Time': polars.datatypes.Utf8, .. Converting to Datetime: df = ( df.lazy() .select(list_cols) …
DBOak
  • 43
  • 7
0
votes
1 answer

How to get an index of maximum count of a required string in a list column of polars data frame?

I have a polars dataframe as pl.DataFrame({'doc_id':[ ['83;45;32;65;13','7;8;9'], ['9;4;5','4;2;7;3;5;8;10;11'], ['1000;2000','76;34;100001;7474;2924'], ['100;200','200;100'], ['3;4;6;7;10;11','1;2;3;4;5'] ]}) each list consist…
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
0
votes
2 answers

Connecting to Azure storage account to read parquet file via managed identity using polars library

I am using python version of the polars library to read a parquet file with large no of rows . Here is the link to the library - https://github.com/pola-rs/polars I am trying to read a parquet file from Azure storage account using the read_parquet…
Niladri
  • 5,832
  • 2
  • 23
  • 41
0
votes
1 answer

How to cumulatively sum the first elements when using the .over() function for a specific column in Python Polars

I was wondering if someone could please enlighten me. I am trying to cumulatively sum pty_nber over/groupby a specific column (Declaration). My original idea was to use something…
0
votes
1 answer

Conditional assignment in polars dataframe

I am wondering if there's a way to handle conditional assignment in polars dataframe without using numpy related. import pandas as pd df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'], 'conference': ['East', 'East', 'East',…
codedancer
  • 1,504
  • 9
  • 20
0
votes
1 answer

Can I use indexing [start:end] in expressions instead of offset and length in polars

In Exp.slice, the only supported syntax is exp.slice(offset,length), but in some cases, something like exp[start:end] would be more convenient and consistent. So how to write exp[1:6] for exp.slice(1,5) just like it in pandas?
Facet
  • 53
  • 6
0
votes
0 answers

Diagonal concatenate of two dataframes not working in Polars

I'm trying to concatenate two dataframes with Polars in Python, and it keeps throwing an error despite my syntax appearing to be correct based on the docs. Specifically, I've got ldf_a with shape: (4, 33) and ldf_b with shape: (4, 33). When I…
0
votes
1 answer

Polars: Search and replace in column names: is it possible with LazyFrames?

A follow up from an already answered question, is it possible to Search and replace column names in a LazyFrame? I am doing this as a workaround (based on the linked answer by ritchie46, and thanks for that!): df = df.lazy().collect() df.columns =…
DBOak
  • 43
  • 7
0
votes
1 answer

How to add a new field with the counts per group criteria in python polars?

I have a small use case and here is a polars dataframe. df_names = pl.DataFrame({'LN'['Mallesham','Bhavik','Mallesham','Bhavik','Mahesh','Naresh','Sharath','Rakesh','Mallesham'], …
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
0
votes
1 answer

Conditional sum by columns in Polars Python

I udnerstand how to perform conditional sum in columns but I am wondering how to achieve a similar approach and end up as a dataframe import pandas as pd import df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'], …
codedancer
  • 1,504
  • 9
  • 20
0
votes
2 answers

How to get fuzzy matches of given set of names in python polars dataframe?

I'm trying to implement a name duplications for one of our use case. Here I have a set of 10 names along with their index column as below. Here I would like to calculate fuzzy metrics(Levenshtein,JaroWinkler) per each of name combinations using a…
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
0
votes
0 answers

Numerical stability of Expr.mean in GroupBy context

Numerical stability of Expr.mean in GroupBy context (GroupBy.mean()) seems not only worse than the pandas version, but also worse than Expr.mean in select context. import numpy as np import pandas as pd import polars as pl df =…
taozuoqiao
  • 21
  • 1
  • 5
0
votes
1 answer

How can you sort the columns of a Polars DataFrame alphabetically within a query?

I have a Polars dataframe and want to sort the columns alphabetically without access to a dataframe variable. In this example I want the column order to be ['a','b']: import polars as pl df = pl.DataFrame({'b':[0,1],'a':[2,3]}) If I have the…
braaannigan
  • 594
  • 4
  • 12
0
votes
2 answers

How to work with date format columns in python polars dataframe?

I have an excel spreadsheet(.xlsx) with a date of birth column as below. On loading it using the below syntax: pl.read_excel(r'C:\datos\test.xlsx',read_csv_options={'parse_dates':False}) the date of births are changing into two digit year format…
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
0
votes
1 answer

polars groupy cannot get mean of datetime column

I have a dataframe with a column of datetimes, a column of floats, and a column of integers like this: ┌─────────────────────────┬───────────┬─────────────┐ │ time ┆ NAV_DEPTH ┆ coarse_ints │ │ --- ┆ --- …
Callum Rollo
  • 515
  • 3
  • 12