3

While learning NumPy, I came across its advantage that,

NumPy requires less memory than traditional list.

import numpy as np
import sys

# Less Memory

l = range(1000)
print(sys.getsizeof(l[3])*len(l))

p = np.arange(1000)
print(p.itemsize*p.size)

this looks convincing, but than when I try,

print(sys.getsizeof(p[3])*len(p))

It shows higher memory size than list.

Can someone help me out understanding this behavior.

Community
  • 1
  • 1
  • 8
    `p[3]` creates a new object, a "boxed" version that wraps the primitive data type contained inside the array buffer. Note, using `sys.getsizeof` has a lot of caveats, especially when dealing with containers. – juanpa.arrivillaga Apr 16 '18 at 07:20
  • 2
    getsizeof is not a good measure of list memory usage. It only shows the memory used to store pointers. – hpaulj Apr 16 '18 at 07:21

2 Answers2

1

First off all, as mentioned in comments getsizeof() is not a good function to relay on for this purpose, because it does not have to hold true for third-party extensions as it is implementation specific. Also, as mentioned in documentation, if you want to find the size of containers and all their contents, there is a recipe available at: https://code.activestate.com/recipes/577504/.

Now, regarding the Numpy arrays, it's very important to know how Numpy determines its arrays' types. For that purpose, you can read: How does numpy determin the array's dtype and what it means?

To sum up, the most important reason that Numpy performs better in memory managements is that it provides a wide variety of types that you can use for different kinds of data. You can read about Numpy's datatypes here: https://docs.scipy.org/doc/numpy-1.14.0/user/basics.types.html. Another reason is that Numpy is a library designed to work with matrices and arrays and for that reason there are many under the hood optimizations on how their items consume the memory.

Also, it's note worthy that Python provides an array module designed to perform efficiently by using constrained item types.

Arrays are sequence types and behave very much like lists, except that the type of objects stored in them is constrained. The type is specified at object creation time by using a type code, which is a single character.

jpp
  • 159,742
  • 34
  • 281
  • 339
Mazdak
  • 105,000
  • 18
  • 159
  • 188
0

It's easier to understand the memory use of arrays:

In [100]: p = np.arange(10)
In [101]: sys.getsizeof(p)
Out[101]: 176
In [102]: p.itemsize*p.size
Out[102]: 80

The databuffer of p is 80 bytes long. The rest of p is object overhead, attributes like shape, strides, etc.

An indexed element of the array is a numpy object.

In [103]: q = p[0]
In [104]: type(q)
Out[104]: numpy.int64
In [105]: q.itemsize*q.size
Out[105]: 8
In [106]: sys.getsizeof(q)
Out[106]: 32

So this multiplication doesn't tell us anything useful:

In [109]: sys.getsizeof(p[3])*len(p)
Out[109]: 320

Though it may help us estimate the size of this list:

In [110]: [i for i in p]
Out[110]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [111]: type(_[0])
Out[111]: numpy.int64
In [112]: sys.getsizeof(__)
Out[112]: 192

The list of 10 int64 objects occupies 320+192 bytes, more or less (the list overhead and its pointer buffer plus the size objects pointed to).

We can extract an int object from the array with item:

In [115]: p[0].item()
Out[115]: 0
In [116]: type(_)
Out[116]: int
In [117]: sys.getsizeof(p[0].item())
Out[117]: 24

Lists of the same len can have differing size, depending on how much growth space they have:

In [118]: sys.getsizeof(p.tolist())
Out[118]: 144

Further complicating things is the fact that small integers have a different storage than large ones - ones below 256 are unique.

hpaulj
  • 221,503
  • 14
  • 230
  • 353