2

I am trying to down sample a fixed [Mx1] vector into any given [Nx1] dimensions by using averaging method. I have a dynamic window size that changes every time depending upon the required output array. So, in some cases i get lucky and get window size of int that perfectly fits according to the window size and sometimes i get floating number as a windows size. But, how can i use floating size windows to make a vector of [Nx1] size from a fixed [Mx1] vector?

Below is the code that i have tried:

chunk = 0.35
def fixed_meanVector(vec, chunk):
   size = (vec.size*chunk) #size of output according to the chunk
   R    = (vec.size/size) #windows size to transform array into chunk size
   pad_size = math.ceil(float(vec.size)/R)*R - vec.size
   vec_padded = np.append(vec, np.zeros(pad_size)*np.NaN)

   print "Org Vector: ",vec.size, "output Size: ",size, "Windows Size: ",R, "Padding size", pad_size
   newVec = scipy.nanmean(vec_padded.reshape(-1,R), axis=1)
   print "New Vector shape: ",newVec.shape
   return newVec

print "Word Mean of N values Similarity: ",cosine(fixed_meanVector(vector1, chunk)
                                                      ,fixed_meanVector(vector2, chunk))

Output:

New Vector shape:  (200,)
Org Vector:  400 output Size:  140.0 Windows Size:  2.85714285714 Padding  size 0.0
New Vector shape:  (200,)
0.46111661289

In above example, I need to down sample [Mx1] ([400x1]) vector in Nx1 ([140x1]) dimensions. So, dynamically window size [2.857x1] can be used to downsample [Mx1] vector . But, in this case i am getting a vector of [200x1] as my output instead of [140x1] due to the floating window it raises to the flour(2.85) it is downsampled with -> [2x1]. Padding is zero because, my window size is perfect for new [Nx1] dimensions. So, is there any way to use such type of windows sizes to down sample a [Mx1] vector?

Nomiluks
  • 2,052
  • 5
  • 31
  • 53
  • Is the number N fixed? – Hun Mar 29 '16 at 14:27
  • @sung No... It is the size of the output... And in the code it is ... size = (vec.size*chunk) – Nomiluks Mar 29 '16 at 15:08
  • If both M and N are not fixed, it is a bit complicated to come up with a very general algorithm. Since your vector is not too big, I would say do it iteratively. What I mean by that is do moving average using a floored integer. Then, you will have a bigger than needed sampled vector. You can repeat the process again using the new vector, etc. – Hun Mar 29 '16 at 16:25
  • @Sung [Mx1] is fixed and we need to downsample it to any [Nx1] dimension. i have made few changes in the question please read it again and thanks for your interest :) – Nomiluks Mar 29 '16 at 16:55

2 Answers2

3

It is possible but not natural to vectorise that, as soon as M%N>0. because the amount of cells used to build the result array is not constant, between 3 and 4 in your case.

The natural method is to run through the array, adjusting at each bin :

enter image description here

the idea is to fill each bin until overflow. then cut the overflow (carry) and keep it for next bin. the last carry is always null using int arithmetic.

The code :

def resized(data,N):
    M=data.size
    res=empty(N,data.dtype)
    carry=0
    m=0
    for n in range(N):
        sum = carry
        while m*N - n*M < M :
            sum += data[m]
            m += 1
        carry = (m-(n+1)*M/N)*data[m-1]
        sum -= carry
        res[n] = sum*N/M
    return res

Test :

In [5]: resized(np.ones(7),3)
Out[5]: array([ 1.,  1.,  1.])

In [6]: %timeit resized(rand(400),140)
    1000 loops, best of 3: 1.43 ms per loop

It works, but not very quickly. Fortunatelly, you can speed it with numba :

from numba import jit
resized2=jit(resized)             

In [7]: %timeit resized2(rand(400),140)
1 loops, best of 3: 8.21 µs per loop

Probably faster than any pure numpy solution (here for N=3*M):

IN [8]: %timeit rand(402).reshape(-1,3).mean(1)
10000 loops, best of 3: 39.2 µs per loop

Note it works also if M>N.

In [9]: resized(arange(4.),9)
Out[9]: array([ 0.  ,  0.  ,  0.75,  1.  ,  1.5 ,  2.  ,  2.25,  3.  ,  3.  ])
macrocosme
  • 473
  • 7
  • 24
B. M.
  • 18,243
  • 2
  • 35
  • 54
  • Hey your method is cool! i have tried your approach and it can reduce vectors to any size. Don't know what is the limitation of your approach. Just trying to understand how you are doing it :) – Nomiluks Mar 30 '16 at 12:59
  • I add some explanation, it was late ;) – B. M. Mar 30 '16 at 16:00
  • Wow,. That's great answer :).. Please add more information if it is possible. .. – Nomiluks Mar 30 '16 at 18:31
  • Your function is somehow similar to savgol_filter (Savitzky-Golay filter) and an interpolation but better as the output vector has the wished value independent on some fixed window length, am I right? You solved two of my problems in once. – hyamanieu Feb 09 '17 at 13:29
2

You're doing it wrong, you build a window for your required decimation, not the other way around.

Mr Nyquist says you can't have a BW above fs/2, or you'll have nasty aliasing.

So to solve it you don't just "average", but lowpass so that frequencies above fs/2 are below your acceptable noise floor.

MA's are a valid type of low pass filter, you're just applying it to the wrong array.

The usual case for arbitrary decimation is.

 Upsample -> Lowpass -> Downsample

So, to be able to arbitrary Decimate from N to M samples the algorithm is:

  • find LCM between your current samples your target samples.
  • upsample by LCM/N
  • design a filter using a stop frequency ws<= M/LCM
  • downsample by LCM/M

What you call averaging method, is a FIR filter with a rectangular window

If you use the first zero of the frequency response in that window as stop band, then you you can calculate the needed window size K , as

2/K <= M/LCM

so you must use windows of size:

ceil(2*LCM/M) = K

Obviously, you don't need to implement all of this. Just design a proper window with ws<= M/LCM and apply it using scipy.signal.resample.

And if the ceil applied to the window messes up your results, don't use rectangular windows, there are tons of better filters you can use.

xvan
  • 4,554
  • 1
  • 22
  • 37
  • Okay. Your suggested approach is a bit confusing for me because i think it is much applicable in signal processing etc. But, i just want to down sample an array by averaging. Is there any other method that is better than this approach please share. It is hard to understand your answer :) – Nomiluks Mar 30 '16 at 13:01
  • 1
    So, as it looks you don't care that much, about the result, just do `window=np.ones(ceil(float(n)/len(x)))` , `window=window/len(window)`, `scipy.signal.resample(x,n,window=window)` – xvan Mar 30 '16 at 15:05
  • Sure I will try this method as well :) but is it possible to downsample an array by preserving the actual meaning as in the original vector? – Nomiluks Mar 30 '16 at 15:07
  • BTW, what you're doing *IS* signal processing. – xvan Mar 30 '16 at 15:09
  • Nope, i have vectors that represent a sentence word. So, I just need to resize the word vector without loosing it's representational power as in the original – Nomiluks Mar 30 '16 at 15:12
  • As long as you dont have any meaningful information on frequencies above N/M on the original vector. Check with an FFT. The quality of the result will depend on the size of the transition band and the attenuation of the rejected band of the antialiasing filter. – xvan Mar 30 '16 at 15:16
  • 1
    Or if you don't care about aliasing at all, call `scipy.signal.resample(x,n)` – xvan Mar 30 '16 at 15:20
  • Frankly speaking, I don't know about FFT and Anti aliasing filters.. So, I need to understand about these things to understand your method. Sorry for this – Nomiluks Mar 30 '16 at 15:20