I was looking at how much space numpy arrays consume in memory and I noticed a peculiar behavior:
When I ran x = np.empty((1000000, 7, 64, 64), dtype=np.uint8)
My computer with 16GB of memory did not crash. Instead it was sailing smoothly with 2GB of memory allocated.
This numpy array should weigh in at 26.70 GB, but something lazy seems to be happening. When I add one, then the laziness stops immediately, and my program hangs and them gets a MemoryError
.
I'm wondering how numpy does this under the hood.
I took a look at numpy.core.multiarray
, and found numpy/core/src/multiarray/multiarraymodule.c
with this bit of code that seems to be the definition of empty:
static PyObject *
array_empty(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *kwds)
{
static char *kwlist[] = {"shape","dtype","order",NULL};
PyArray_Descr *typecode = NULL;
PyArray_Dims shape = {NULL, 0};
NPY_ORDER order = NPY_CORDER;
npy_bool is_f_order;
PyArrayObject *ret = NULL;
if (!PyArg_ParseTupleAndKeywords(args, kwds, "O&|O&O&", kwlist,
PyArray_IntpConverter, &shape,
PyArray_DescrConverter, &typecode,
PyArray_OrderConverter, &order)) {
goto fail;
}
switch (order) {
case NPY_CORDER:
is_f_order = NPY_FALSE;
break;
case NPY_FORTRANORDER:
is_f_order = NPY_TRUE;
break;
default:
PyErr_SetString(PyExc_ValueError,
"only 'C' or 'F' order is permitted");
goto fail;
}
ret = (PyArrayObject *)PyArray_Empty(shape.len, shape.ptr,
typecode, is_f_order);
PyDimMem_FREE(shape.ptr);
return (PyObject *)ret;
fail:
Py_XDECREF(typecode);
PyDimMem_FREE(shape.ptr);
return NULL;
}
I'm wondering how this laziness is achieved in C, and where else it will pop up in numpy.