1

Polars DataFrame does not provide a method to update the value of a single cell currently. Instead, we have to the method DataFrame.apply or DataFrame.apply_at_idx that updates a whole column / Series. This can be very expensive in situations where an algorithm repeated update a few elements of some columns. Why is DataFrame designed in this way? Looking into the code, it seems to me that Series does provide inner mutability via the method Series._get_inner_mut?

cafce25
  • 15,907
  • 4
  • 25
  • 31
Benjamin Du
  • 1,391
  • 1
  • 17
  • 25
  • 2
    I think that's just a result of Polars heavily optimizing for concurrent SIMD operations on the columns. If you regularly need to modify single elements then Polars is probably the wrong crate for your usecase. – cafce25 Jan 01 '23 at 05:50

1 Answers1

2

As of polars >= 0.15.9 mutation of any data backed by number is constant complexity O(1) if data is not shared. That is numeric data and dates and duration.

If the data is shared we first must copy it, so that we become the solely owner.

import polars as pl
import matplotlib.pyplot as plt
from time import time

ts = []
ts_shared = []
clone_times = []
ns = []

for n in [1e3, 1e5, 1e6, 1e7, 1e8]:
    s = pl.zeros(int(n))
    
    t0 = time()
    # we are the only owner
    # so mutation is inplace
    s[10] = 10
    
    # time
    t = time() - t0
    
    # store datapoints
    ts.append(t)
    ns.append(n)
    
    # clone is free
    t0 = time()
    s2 = s.clone()
    t = time() - t0
    clone_times.append(t)
    
    
    # now there are two owners of the memory
    # we write to it so we must copy all the data first
    t0 = time()
    s2[11] = 11
    t = time() - t0
    ts_shared.append(t)
    


plt.plot(ns, ts_shared, label="writing to shared memory")
plt.plot(ns, ts, label="writing to owned memory")
plt.plot(ns, clone_times, label="clone time")
plt.legend()

enter image description here

In rust this dispatches to set_at_idx2, but it is not released yet. Note that using the lazy engine this will all be done implicitly for you.

ritchie46
  • 10,405
  • 1
  • 24
  • 43
  • 1
    The Polars DataFrame does not have a method to mutate a single cell currently. The recommended method is `DataFrame::apply` which takes a function and update the whole column. Will the Polars DataFrame expose an API to update a single cell in future? – Benjamin Du Jan 02 '23 at 19:19