Highest Voted 'python-polars' Questions

0

votes

3 answers

polars dropna equivalent on list of columns

I'm a new polars user. Pandas has df.dropna. I need to replace this functionality, but I haven't found a dropna in polars. Searching for drona currently yields no results in the Polars User Guide. My specific problem: convert the following statement…

dataframe python-polars

asked Oct 06 '22 at 08:57

Callum Rollo

515
3
12

0

votes

0 answers

Lazily reading a parquet file with binary datatype in PyPolars

I hope this is a good question, if I should post this as an issue on the PyPolars GitHub instead, please let me know. I have a quite large parquet file where some columns contain binary data. These columns are not interesting for me right now, so it…

python python-polars

asked Oct 05 '22 at 15:49

dashdeckers

45
3

0

votes

1 answer

Principles of immutability and copy-on-write in polars python api

Hi I'm working on this fan fiction project of a full feature + syntax translation of pypolars to R called "minipolars". I understand the pypolars API e.g. DataFrame in generel elicits immutable-behavior or isch the same as 'copy-on-write' behaviour.…

python-polars

asked Oct 03 '22 at 10:28

Soren Havelund Welling

1,823
1
16
23

0

votes

1 answer

Cluster a column

I have a column I want to cluster: df = pl.DataFrame({"values": [0.1, 0.5, 0.7, -0.2, 0.4, -0.7, 0.05]}) shape: (7, 1) ┌────────┐ │ values │ │ --- │ │ f64 │ ╞════════╡ │ 0.1 │ ├╌╌╌╌╌╌╌╌┤ │ 0.5 │ ├╌╌╌╌╌╌╌╌┤ │ 0.7 │ ├╌╌╌╌╌╌╌╌┤ │ -0.2 …

python-polars

asked Oct 03 '22 at 09:31

Sigi

53
8

0

votes

0 answers

Polars-python. Is it possible to read multiple files with globbing patterns using as storage_options adlfs?

Loading multiple files using glob patterns if we run it in a local filesystem, as it's written in the documentation. However, if I try to load several files at once from the Azure Data lake Gen2, it only loads into the DataFrame the first file that…

python dataframe python-polars

asked Oct 03 '22 at 07:10

Javi Hernandez

314
8
17

0

votes

1 answer

Use f-string in polars dataframe with a loop

I am trying to create a list of new columns based on the latest column. I can achieve this by using with_columns() and simple multiplication. Given I want a long list of new columns, I am thinking to use a loop with an f-string to do it. However, I…

python python-polars

asked Oct 01 '22 at 12:50

codedancer

1,504
9
20

0

votes

1 answer

Polars - how to parallelize lambda that uses only Polars expressions?

This runs on a single core, despite not using (seemingly) any non-Polars stuff. What am I doing wrong? (the goal is to convert a list in doc_ids field in every row into its string representation, s.t. [1, 2, 3] (list[int]) -> '[1, 2, 3]'…

python-polars

asked Sep 28 '22 at 16:32

Tim

236
2
8

0

votes

1 answer

How to check if dataframe columns contains any information except NULL/EMPTY and show them in a new column in python polars?

I have a dataframe as- pl.DataFrame({'last_name':['Unknown','Mallesham',np.nan,'Bhavik','Unknown'], 'first_name_or_initial':['U',np.nan,'TRUE','yamulla',np.nan], …

python python-polars

asked Sep 26 '22 at 17:50

myamulla_ciencia

1,282
1
8
30

0

votes

1 answer

Broadcast a single cell value to a column

In pandas it is possible to broadcast a single value to an entire column or even a slice: frame.loc[start_index:stop_index, 'a'] = frame.loc[some_row_index, 'a'] that is, a single value being broadcast to a Series. I tried something similar with…

python-polars

asked Sep 23 '22 at 18:42

sobek

1,386
10
28

0

votes

1 answer

Overwrite a slice of a timeseries with a value

I have some timeseries data in the form of a pl.DataFrame object with a datetime col and a data col. I would like to correct an error in the data that occurs during a distinct time range by overwriting it with a value. Now in pandas, one would use…

python-polars

asked Sep 23 '22 at 06:31

sobek

1,386
10
28

0

votes

0 answers

Connectorx Server requested a connection to an alternative address in azure pipeline

connecting to sql server using connectorx and polars. everything works correctly locally and not getting any errors. however, when using azure pipelines to run code getting the following error "result = _read_sql(RuntimeError: Server requested a…

python azure-pipelines connector python-polars

asked Sep 22 '22 at 09:22

tommyt

309
5
15

0

votes

1 answer

Does `pl.concat([lazyframe1, lazyframe2])` strictly preserve the order of the input dataframes?

Suppose I create a polars Lazyframe from a list of csv files using pl.concat(): df = pl.concat([pl.scan_csv(file) for file in ['file1.csv', 'file2.csv']]) Is the data in the resulting dataframe guaranteed to have the exact order of the input files,…

python-polars

asked Sep 19 '22 at 12:31

DataWiz

401
6
14

0

votes

1 answer

Specify string format for numeric during conversion to pl.Utf8

Is there any way to specify a format specifier if, for example, casting a pl.Float32, without resorting to complex searches for the period character? As in something like: s = pl.Series([1.2345, 2.3456, 3.4567]) s.cast(pl.Utf8, fmt="%0.2f") # fmt…

python python-polars

asked Sep 18 '22 at 22:58

NedDasty

192
1
8

0

votes

1 answer

Is it semantically possible to optimize LazyFrame -> Fill Null -> Cast to Categorical?

Here is a trivial benchmark based on a real-life workload. import gc import time import numpy as np import polars as pl df = ( # I have a dataframe like this from reading a csv. pl.Series( name="x", values=np.random.choice( …

python-polars

asked Sep 17 '22 at 14:49

John Hopfensperger

147
4

0

votes

1 answer

LazyFrame memory usage (polars.scan_csv vs polars.read_csv, single threaded)

I have some sample csv files and two programs to read/filter/concat the csvs. Here is the LazyFrame version of the code: import os os.environ["POLARS_MAX_THREADS"] = "1" import polars as pl df = pl.concat( [ …

python-polars

asked Sep 17 '22 at 02:24

John Hopfensperger

147
4

Questions tagged [python-polars]

Links

polars dropna equivalent on list of columns

Lazily reading a parquet file with binary datatype in PyPolars

Principles of immutability and copy-on-write in polars python api

Cluster a column

Polars-python. Is it possible to read multiple files with globbing patterns using as storage_options adlfs?

Use f-string in polars dataframe with a loop

Polars - how to parallelize lambda that uses only Polars expressions?

How to check if dataframe columns contains any information except NULL/EMPTY and show them in a new column in python polars?

Broadcast a single cell value to a column

Overwrite a slice of a timeseries with a value

Connectorx Server requested a connection to an alternative address in azure pipeline

Does `pl.concat([lazyframe1, lazyframe2])` strictly preserve the order of the input dataframes?

Specify string format for numeric during conversion to pl.Utf8

Is it semantically possible to optimize LazyFrame -> Fill Null -> Cast to Categorical?

LazyFrame memory usage (polars.scan_csv vs polars.read_csv, single threaded)