Fastest way to grow 2D Arrays in Cython

Question

Consider a Cython extension class that takes some input and then runs a method to build out a 2D array. One of the final solution's dimension is known at instantiation (based on user input), however the other dimension will change depending on various factors that are not known ahead of time. I am looking for the fastest way to grow a solution array.

This is a similar question to that discussed here and here. I put some of the suggestions discussed in those posts to the test below. I want to see if the community has any futher thoughts on this topic, particularly with Cython 3.0.0's release - are there any more recent tricks that can be employed?

Appending ndarrays to Python Lists

Example code:

import numpy as np

cdef class ListOfNdarrays:
    
    cdef Py_ssize_t y_size, index
    cdef list storage
    cdef const double[::1] y0_view
    cdef public double[:, ::1] solution
    
    def __init__(self, const double[::1] y0):
        
        self.y0_view = y0
        self.y_size  = self.y0_view.size
        self.storage = list()
        
        self.run()
        
    cdef run(self):
        
        cdef Py_ssize_t i, j 
        cdef double[::1] y_view
        y = np.empty(self.y_size, dtype=np.float64, order='C')
        y_view = y
        
        while True:
            
            # For testing purposes, let the loop run for a set amount of steps.
            # In reality, _we won't know how many steps will be required_.
            # This value 10_000 number would change based on the calculations performed below.
            if self.index > 10_000:
                break
            
            for i in range(self.y_size):
                # Perform some complex calculations to find y...
                y_view[i] = self.y0_view[i] + (<double>self.index / (<double>i + 1.))
            
            # Make a copy of the y array for this step. Store it in the list.
            self.storage.append(y_view.copy())
            self.index += 1
        
        # Save results in a more user friendly format
        solution_array = np.empty((self.y_size, self.index), dtype=np.float64, order='C')
        self.solution = solution_array
        for i in range(self.y_size):
            for j in range(self.index):
                self.solution[i, j] = self.storage[j][i]

If I run the above I find...

# Create a y0 array with 10 random values (y_size=10)
y0 = np.random.random_sample(10)

%timeit ListOfNdarrays(y0)
# Times on the order of 25ms

Use a Prebuilt np.ndarray

Instead of appending to a python list, create a numpy array with a fixed size that is based on a user-provided (or algorithmic) guess(*). If the guess turns out to be wrong, resize the array as needed.

Relevant code changes from above:

    ... # Other class attributes
    cdef Py_ssize_t expected_size, concats
    cdef double[:, ::1] storage

    cdef __init__(self, const double[::1] y0, Py_ssize_t expected_size):

        # Setup a storage array based on a user-defined guess on the final size.
        self.concats = 1
        self.expected_size = expected_size
        storage_arr = np.empty((self.y_size, self.expected_size), dtype=np.float64, order='C')
        self.storage = storage_arr
        ... # Other init steps

    cdef run(self):
        cdef double[:, ::1] new_storage_view
        
        while True:
            ... # Run loop as normal. At the end of loop see if there is enough room.

            if self.index >= (self.concats * self.expected_size):
                self.concats += 1
                new_storage = np.empty((self.y_size, self.expected_size * self.concats),
                                       dtype=np.float64, order='C')
                new_storage_view = new_storage
                
                # Populate old values
                for i in range(self.y_size):
                    for j in range(self.index):
                        new_storage_view[i, j] = self.storage[i, j]
                # Reassign variable
                self.storage = new_storage_view
            
            # Add this step's results to storage
            for i in range(self.y_size):
                self.storage[i, self.index] = y_view[i]

I find a big improvement, orders of magnitude faster than appending to lists.

# Manually expanding numpy storage by fixed increments based on an initial guess
# Times if we have a perfect guess: 280us
%timeit ManuallyGrowNdarray(y0, 10_001)

# In reality we will never have a perfect guess for the size...
# Times if we overshoot by 2x:  641us
%timeit ManuallyGrowNdarray(y0, 2 * 10_001)
# Times if we undershoot by 2x: 687us
%timeit ManuallyGrowNdarray(y0, int(0.5 * 10_001))

Note: if I use np.resize, instead of building a new np.empty array during the resize step I find worse performance (823us if we undershoot by 2x)

Use C pointers and PyMem_Malloc to build the output array

Last example uses a C array and PyMem_Malloc (I tried C malloc/realloc, but did not see a noticeable difference).

Relevant changes:

    ... # Other definitions
    cdef double* storage

    cdef __init__(self, const double[::1] y0, Py_ssize_t expected_size):

        # Setup a storage array based on a user-defined guess on the final size.
        self.concats = 1
        self.expected_size = expected_size
        self.new_size = self.expected_size
        self.storage = <double*>PyMem_Malloc(self.y_size * self.expected_size * sizeof(double))
        if not self.storage:
            raise MemoryError()
        ... # Other init steps

    cdef run(self):
        cdef double[:, ::1] new_storage_view
        
        while True:
            # Like before: run loop as normal. At the end of loop see if there is enough room.

            if self.index >= (self.concats * self.expected_size):
                # We need to make a larger array
                self.concats += 1
                self.new_size = self.concats * self.expected_size
                self.storage = <double*>PyMem_Realloc(self.storage,
                                                      self.y_size * self.new_size * sizeof(double))
                if not self.storage:
                    raise MemoryError()
            
            # Add this step's results to storage
            for i in range(self.y_size):
                self.storage[self.y_size * self.index + i] = y_view[i]

    # Not shown: Convert final result to a numpy array for better output experience... Make sure to free up that array's memory...

There is a noticeable performance boost if the array needs to be expanded.

# Use c pointers with C-Api Pymallocs
# Times if we have a Perfect guess: 296us
%timeit GrowCArray(y0, 10_001)
# Times if we overshoot by 2x: 472us
%timeit GrowCArray(y0, 2 * 10_001)
# Times if we undershoot by 2x: 574us
%timeit GrowCArray(y0, int(0.5 * 10_001))

Summary:

What is the fastest way to grow 2D arrays in a Cython 3.0.0+ world? These simplified experiments found that preallocating memory (with either numpy or directly in C) is the fastest. But are there better approaches?

*: Resizing by a fixed amount, as is done here, is not the best way to go about it since if you overshoot a little you just grabbed double the memory you really need. I am also curious what are good algorithms to choose a more appropriate growth size? Are there packages that can do this for work for us? As noted, np.resize() has a big performance penalty as does list.append().

All examples were tested in a jupyter notebook using the compile rules: # cython: boundscheck=False, wraparound=False, nonecheck=False, cdivision=True, initializedcheck=False Package Versions: Python 3.11; Cython 3.0.0; Numpy 1.24.3

A python list has an c array with space for its pointers plus a growth space. Once that fills up, it gets a new array plus growth space. `sys.getsizeof` can be used to track that space. — hpaulj, Sep 01 '23 at 21:26

Fastest way to grow 2D Arrays in Cython

Appending ndarrays to Python Lists

Use a Prebuilt np.ndarray

Use C pointers and PyMem_Malloc to build the output array

Summary:

0 Answers0