3

I have a time series dataset with some nan values in it. I want to detrend this data:

I tried by doing this:

scipy.signal.detrend(y)

then I got this error:

ValueError: array must not contain infs or NaNs

Then I tried with:

scipy.signal.detrend(y.dropna())

But I lost data order.

How to solve this porblem?

vestland
  • 55,229
  • 37
  • 187
  • 305
bikuser
  • 2,013
  • 4
  • 33
  • 57
  • 2
    You *could* set the `NaN` values to be the average of the values near it. Your other main method is just to detrend the data manually; i.e., use linear least-squares on the data you *do* have and then subtract that line. Not too difficult. There's a *bunch* of functions for this: `scipy.linalg.lstsq`, `scipy.stats.linregress`, `scipy.optimize.least_squares`, `scipy.optimize.lsq_linear`, `numpy.linalg.lstsq`. – alkasm Jun 27 '17 at 13:07
  • @AlexanderReynolds yes that's true. Interpolation does not help me so subtracting line from data could help. – bikuser Jun 27 '17 at 13:16

2 Answers2

5

For future reference there is a digital signal processing Stack site, https://dsp.stackexchange.com/. I would suggest using that in the future for signal processing related questions.


The easiest way I can think of is to manually detrend your data. You can do this easily by computing least squares. Least squares will take into account both your x and y values, so you can drop out the x values corresponding to where y = NaN.

You can grab the indices of the non-NaN values with not_nan_ind = ~np.isnan(y), and then do linear regression with the non-NaN values of y and the corresponding x values with, say, scipy.stats.linregress():

m, b, r_val, p_val, std_err = stats.linregress(x[not_nan_ind],y[not_nan_ind])

Then you can simply subtract off this line from your data y to obtain the detrended data:

detrend_y = y - (m*x + b)

And that's all you need. For example with some dummy data:

import numpy as np
from matplotlib import pyplot as plt
from scipy import stats

# create data
x = np.linspace(0, 2*np.pi, 500)
y = np.random.normal(0.3*x, np.random.rand(len(x)))
drops = np.random.rand(len(x))
y[drops>.95] = np.NaN # add some random NaNs into y
plt.plot(x, y)

Data with some NaN values

# find linear regression line, subtract off data to detrend
not_nan_ind = ~np.isnan(y)
m, b, r_val, p_val, std_err = stats.linregress(x[not_nan_ind],y[not_nan_ind])
detrend_y = y - (m*x + b)
plt.plot(x, detrend_y)

Detrended data

alkasm
  • 22,094
  • 5
  • 78
  • 94
1

Only detrend non-nan parts but keep nan parts:

signal[np.logical_not(pd.isna(signal))] = scipy.signal.detrend(signal[np.logical_not(pd.isna(signal))])
Brydenr
  • 798
  • 1
  • 19
  • 30
Wolfgang G
  • 11
  • 2