9

If I have a very large numpy array with one useless column, how could I drop it without creating a copy of the original array?

np.delete(my_np_array, 0, 1)

The above code will return a copy of the array without the zero-th column. But instead I would like to simply delete that column from my_np_array since I don't need it. For very large datasets, the memory management becomes important and copying may not be an option.

ProgramFOX
  • 6,131
  • 11
  • 45
  • 51
Krishan Gupta
  • 3,586
  • 5
  • 22
  • 31
  • 2
    Preventing copies in numpy is unfortunately tricky business. If it's the first or last column, you can probably get away with a simple array slice. That probably won't copy the array *right now*, but then if you go on to do more complicated things with it, there's no guarantee that you won't get copies (or temporary arrays) later in your code (AFAIK). – mgilson Dec 14 '13 at 07:30
  • @Krishan Is it possible to load it differently into python? If the array is somewhat generated from data, can we do something to kill the column before hand? If not, can we preprocess the array in other ways, like in `MATLAB`? – Ray Dec 15 '13 at 17:10

3 Answers3

5

If memory is the main concern, what you can do is move columns around within your array such that the unneeded column gets at the very end of your array, then use ndarray.resize, which modifies he array in-place, to shrink it down and discard the outer column.

You cannot simply remove the first column of an array in-place using the provided API, and I suspect it is because of the memory layout of an ndarray that maps multidimensional indexing to unidimensional byte-oriented addressing within blocks of contiguous memory.

The following example copies the last column into the first and then deletes the last (now unneeded), immediately purging the associated memory. So it basically removes the obsolete column from memory completely, at the cost of changing your column order.

D1, D2 = A.shape
A[:, 0] = A[:, D2-1] 
A.resize((D1, D2-1), refcheck=False)
A.shape  
# => would be (5, 4) if the shape was initially (5, 5) for example
matehat
  • 5,214
  • 2
  • 29
  • 40
4

If you use slicing numpy won't make a copy; in other words

a = numpy.array([1, 2, 3, 4, 5])
b = a[1:]  # view elements from second to last, NOT making a copy
b[0] = 12  # Change first element of `b`, i.e. second of `a`
print a

will reply [1, 12, 3, 4, 5]

If you need to delete an element in the middle however a single slicing won't work.

6502
  • 112,025
  • 15
  • 165
  • 265
  • 1
    just adding... for the OP case do `b=a[:,1:]` – Saullo G. P. Castro Dec 14 '13 at 09:52
  • I would be curious to know, since numpy provides you with a new "view" of the same data structure when you use slicing, when does it actually "free" the memory used up by some column inside it? In other words, does it rely on Python's GC or something? Your solution basically tells numpy you now need a new view of the same data, you're not yet telling it that it can discard some portion of it. If the OP's concern was to free up some portion of unneeded memory as fast as possible, I wonder when that is going to happen. – matehat Dec 14 '13 at 20:00
  • 1
    @matehat memory deallocation won't happen, as a view is merely a method to access memory allocated for array differently (casted to other types, reshaped/broadcasted to different shape, skipping some columns/rows etc), and can be freed only completely, not by parts – alko Dec 14 '13 at 22:26
0

Numpy arrays are immutable. So they can't be re-sized without creating a intermediate copy. How to remove specific elements in a numpy array Creating a view with slicing, and make a copy of that is probably the fastest you can do.

In [804]: a = np.ones((2,2))

In [805]: a
Out[805]:
array([[ 1.,  1.],
       [ 1.,  1.]])

In [806]: np.resize(a,(3,2))
Out[806]:
array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

In [807]: a  <- a should now be resized if it was done inplace? 
Out[807]:
array([[ 1.,  1.],
       [ 1.,  1.]])
Community
  • 1
  • 1
M4rtini
  • 13,186
  • 4
  • 35
  • 42
  • As per [ndarray.resize's documentation](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.resize.html#numpy.ndarray.resize), arrays can be modified in-place. – matehat Dec 14 '13 at 19:55
  • There's `np.resize` which returns a new array, and `a.resize` (where `a` is an array) which resizes it in-place – matehat Dec 14 '13 at 20:23