Polars DataFrame does not provide a method to update the value of a single cell currently. Instead, we have to the method DataFrame.apply
or DataFrame.apply_at_idx
that updates a whole column / Series. This can be very expensive in situations where an algorithm repeated update a few elements of some columns. Why is DataFrame
designed in this way? Looking into the code, it seems to me that Series
does provide inner mutability via the method Series._get_inner_mut
?
Asked
Active
Viewed 1,020 times
1

cafce25
- 15,907
- 4
- 25
- 31

Benjamin Du
- 1,391
- 1
- 17
- 25
-
2I think that's just a result of Polars heavily optimizing for concurrent SIMD operations on the columns. If you regularly need to modify single elements then Polars is probably the wrong crate for your usecase. – cafce25 Jan 01 '23 at 05:50
1 Answers
2
As of polars >= 0.15.9
mutation of any data backed by number is constant complexity O(1)
if data is not shared. That is numeric data and dates and duration.
If the data is shared we first must copy it, so that we become the solely owner.
import polars as pl
import matplotlib.pyplot as plt
from time import time
ts = []
ts_shared = []
clone_times = []
ns = []
for n in [1e3, 1e5, 1e6, 1e7, 1e8]:
s = pl.zeros(int(n))
t0 = time()
# we are the only owner
# so mutation is inplace
s[10] = 10
# time
t = time() - t0
# store datapoints
ts.append(t)
ns.append(n)
# clone is free
t0 = time()
s2 = s.clone()
t = time() - t0
clone_times.append(t)
# now there are two owners of the memory
# we write to it so we must copy all the data first
t0 = time()
s2[11] = 11
t = time() - t0
ts_shared.append(t)
plt.plot(ns, ts_shared, label="writing to shared memory")
plt.plot(ns, ts, label="writing to owned memory")
plt.plot(ns, clone_times, label="clone time")
plt.legend()
In rust this dispatches to set_at_idx2
, but it is not released yet. Note that using the lazy engine this will all be done implicitly for you.

ritchie46
- 10,405
- 1
- 24
- 43
-
1The Polars DataFrame does not have a method to mutate a single cell currently. The recommended method is `DataFrame::apply` which takes a function and update the whole column. Will the Polars DataFrame expose an API to update a single cell in future? – Benjamin Du Jan 02 '23 at 19:19