27

I'm self learning python and have found a problem which requires down sampling a feature vector. I need some help understanding how down-sampling a array. in the array each row represents an image by being number from 0 to 255. I was wonder how you apply down-sampling to the array? I don't want to scikit-learn because I want to understand how to apply down-sampling. If you could explain down-sampling too that would be amazing thanks.

the feature vector is 400x250

Neo Streets
  • 525
  • 1
  • 7
  • 15

3 Answers3

42

If with downsampling you mean something like this, you can simply slice the array. For a 1D example:

import numpy as np
a = np.arange(1,11,1)
print(a)
print(a[::3])

The last line is equivalent to:

print(a[0:a.size:3])

with the slicing notation as start:stop:step

Result:

[ 1 2 3 4 5 6 7 8 9 10]

[ 1 4 7 10]

For a 2D array the idea is the same:

b = np.arange(0,100)
c = b.reshape([10,10])
print(c[::3,::3])

This gives you, in both dimensions, every third item from the original array.

Or, if you only want to down sample a single dimension:

d = np.zeros((400,250))
print(d.shape)
e = d[::10,:]
print(e.shape) 

(400, 250)

(40, 250)

The are lots of other examples in the Numpy manual

Bart
  • 9,825
  • 5
  • 47
  • 73
  • but how do you do this for a 2d array – Neo Streets Dec 11 '15 at 21:03
  • I updated the answer. But I'm not sure to what size you want to downsample your original 400x250 array? – Bart Dec 11 '15 at 21:07
  • Saying _"this doesn't work"_ isn't very helpful. What doesn't work? Or even better: could you provide a simple example of exactly how the down sampling should work (e.g., from a 2D array `[[0,1,..,9],[10,11,..,19]`, the down sampled array should contain elements `[[1,3,..],[11,13,..]]`)? Which items should be kept? Or you mention that you don't want to use `scikit-learn`, but which routine should it reproduce? – Bart Dec 11 '15 at 22:08
  • I don't think this can answer what OP was trying to ask. What he meant is [down-sampling in Machine Learning](https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data). –  Jun 15 '21 at 12:54
  • 1
    @Phan Nhat Huy, yes, now (5-6 years later) I would also interpret OP's question different. Feel free to write an answer more suitable for ML. – Bart Jun 15 '21 at 13:26
0

If you want to downsample along certain dimensions, you can use mean, which not only decimates, but also downsamples. Below example: Downsamples an ndarray of size (h,w,3) along axes 0,1, but not along dim 3:

def downsample_2x(arrayn3d):
    """
        Downsamples an ndarray of size `(h,w,3)` along axes 0,1 (along w,h)
        Input can be non-float, e.g. uint8
    """
    dtype1 = array3d.dtype
    a = array3d.astype(float)
    (h,w,_) = a.shape
    assert w % 2 == 0
    assert h % 2 == 0
    w2 = int(w/2)
    h2 = int(h/2)
    a = a.reshape((h,w2,2,3))
    a = np.mean(a, axis=2)
    assert a.shape == (h,w2,3)
    a = a.reshape((h2,2,w2,3))
    a = np.mean(a, axis=1)
    assert a.shape == (h2,w2,3)
    a = np.floor(a).astype(dtype1)
    return a

Which gives a matrix of size (w/2,h/2,3). If w and h are not even numbers, it will be slightly more complicated. This is not the most efficient way to do it, but the steps and ideas should be clear.

Sohail Si
  • 2,750
  • 2
  • 22
  • 36
-1

from skimage.measure import block_reduce
new_matrix=block_reduce(Matrix_for_downsample,block_size=(m,n),func=np.mean/np.max/..)

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 23 '23 at 08:28
  • The question explicitly mentions that OP does not want to use `scikit-learn`. – Bart Apr 07 '23 at 06:23