1

Using the best answer from this post: Reducing noise on Data

I cannot manage to re-use the code to denoise my data-> csv file that can be found here: https://drive.google.com/open?id=1qVOKjDTAIEdB4thiTgv7BmSmfIoDyZ0J

My code:

import pandas as pd
import matplotlib.pyplot as plt
from scipy.signal import lfilter

data = pd.read_csv("Gain_Loss_test.csv")

#plot the original data
x = np.arange(1, 300, 1)  # x axis
y = data
plt.plot(x, y, linewidth=1, linestyle="-", c="b")

#apply the filter and plot the denoised data
n = 15  # the larger n is, the smoother curve will be
b = [1.0 / n] * n
a = 1
yy = lfilter(b,a,y)
plt.plot(x, yy, linewidth=1, linestyle="-", c="b")

Both charts look the same, only the scale is changing, in relation to n. I don't want to scale it, i want to smooth it. In the original post, they also use n=15 but the denoised data is not scaled. I tried changing n, only changes scale, no smoothing.

Before filter:

enter image description here

After filter:

enter image description here

Edit: After applying the fix proposed in the answer, all smooth, no scaling !:

enter image description here

Marco Cerliani
  • 21,233
  • 3
  • 49
  • 54
Hugues
  • 197
  • 1
  • 5
  • 19
  • My data was given in the google drive link, csv file that can be downloaded. If I use instead the random data you propose, it works fine, no scaling, only smoothing. I confirm I use b = [1.0 / n] * n. Not sure why it is linked with my data. – Hugues Nov 17 '18 at 17:07

1 Answers1

1

Note that you should use header=None when you read that file using pandas.read_csv, otherwise the first line of data is treated as a header:

In [27]: data = pd.read_csv("Gain_Loss_test.csv", header=None)

The reason for the strange result of filtering data with lfilter is that the Pandas DataFrame looks like a two-dimensional array with shape (300, 1):

In [28]: data.shape
Out[28]: (300, 1)

scipy.lfilter works with n-dimensional arrays, but it must be told which axis contains the signal(s) to be filter. The default is axis=-1, which is the last axis. For your data, that means it is filtering 300 signals, each with length 1. That is definitely not what you want.

There are several simple ways to fix this:

  • Use axis=0 in the lfilter call:

    yy = lfilter(b, a, data, axis=0)
    
  • Instead of passing the DataFrame to lfilter, pass just the first column:

    yy = lfilter(b, a, data[0])
    

    data[0] is a Pandas Series object, which looks one-dimensional.

  • Skip Pandas, and read the data using, say, numpy.loadtxt:

    In [46]: data = np.loadtxt('Gain_Loss_test.csv')
    
    In [47]: data.shape
    Out[47]: (300,)
    
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • Yup, you nailed it, axis=0 fixed my problem, I would not have found that on my own, thanks a lot ! – Hugues Nov 17 '18 at 18:38