2

I'm trying to implement a low pass filter on accelerometer data(with x-acceleration(ax), y-acceleration(ay), z-acceleration(az))

I have calculated my alpha to be 0.2

DC component along the x direction is calculated using the formula

new_ax[n] = (1-alpha)*new_ax[n-1] + (alpha * ax[n])

I'm able to calculate this for a small dataset with few thousand records. But I have a dataset with a million records and it takes forever to run with the below code. I would appreciate any help to improvise my code for time complexity.

### df is a pandas dataframe object
n_ax = []
seq = range(0, 1000000, 128)
for w in range(len(seq)):
   prev_x = 0
   if w+1 <= len(seq):
      subdf = df[seq[w]:seq[w+1]]
      for i in range(len(subdf)):
          n_ax.append((1-alpha)*prev_x + (alpha*subdf.ax[i]))
          prev_x = n_ax[i]
user1946217
  • 1,733
  • 6
  • 31
  • 40

2 Answers2

1

First it seems you don't need

if w+1 <= len(seq):

the w variable will not surpass len(seq).

So to decrease time processing just use numpy module:

import numpy;

Here you will find arrays and methods that are much faster than built-in list. For example instead of looping trough every element in a numpy array to do some processing you can apply a numpy function directly on the array and get the results in seconds rather than hours. as an example:

data = numpy.arange(0, 1000000, 128);
shiftData = numpy.arange(128, 1000000, 128);
result = (1-alpha)*data[:-1] + shiftdata;

Check some tutorials on numpy. I use this module for processing image data and by comparison looping through lists would have taken me 2 weeks to processes 5000+ image while using numpy types takes maximum 2 minutes.

DorinPopescu
  • 715
  • 6
  • 10
0

Assuming you are using python 2.7.

  • Use xrange.
  • Computing len(seq) inside the loop is not necessary, since its value is not changing.
  • Accessing seq it is not really needed, since you can compute it on the fly.
  • You don't really need the if statement, since in your code it always evaluate to true (w in range(len(seq)) means w maximium value will be len(seq)-1).
  • The slicing you are doing to get subdf is not really necessary, since you can access directly df (and slicing creates a new list).

See the code below.

n_ax = []
SUB_SAMPLE = 128
SAMPLE_LEN = 1000000
seq_len = SAMPLE_LEN/SUB_SAMPLE
for w in xrange(seq_len):
   prev_x = 0
   for i in xrange(w*SUB_SAMPLE,(w+1)*SUB_SAMPLE):
       new_x = (1-alpha)*prev_x + (alpha*df.ax[i])
       n_ax.append(new_x)
       prev_x = new_x

I cannot think any other obvious optimization. If this is still slow, perhaps you should consider copying df data to a python native data type. If these are all floats, use python array which gives very good performance.

And if you still need better performance, you can try parallelism with the multiprocessing module, or write a C module that takes an array in memory and does the computation, and call it with ctypes python library.

eguaio
  • 3,754
  • 1
  • 24
  • 38