2

Say you have the following series of values:

import numpy as np
data1=np.array([1,  0.28571429,  0.20408163,  0.16326531,  0.14285714,
        0.10204082,  0.08163265,  0.06122449,  0.04081633,  0.02040816])

You now want to plot the Spectral Density of data1 using numpy.fft:

ps1 = np.abs(np.fft.fft(data1))**2
time_step = 1 / 30
freqs1 = np.fft.fftfreq(data1.size, time_step)
idx1 = np.argsort(freqs1)
fig,ax=plt.subplots()
ax.plot(freqs1[idx1], ps1[idx1],color='red',label='data1')
plt.grid(True, which="both")
ax.set_yscale('log')
ax.set_xscale('log')
plt.xlabel("Frequency")
plt.ylabel("Power")
plt.xlim(0,15)

enter image description here

My question: if the plot represents the signal of my series, how can I determine from the power spectral density if this is 1/f (or any other) noise?

MB-F
  • 22,770
  • 4
  • 61
  • 116
FaCoffee
  • 7,609
  • 28
  • 99
  • 174
  • 2
    How do you know there is noise in the data? Conversely, how do you know there is anything other than noise in the data? And, importantly, what is your definition of noise? – MB-F Feb 28 '17 at 14:53
  • I don't know if noise is present or not. I would like to figure it out. This could be a series with `1/f` noise or something else. How can I figure it out? – FaCoffee Feb 28 '17 at 14:58
  • The problem is, you can't. Unless you know what you are looking for in the data. If you have a noise-free version of the signal you can compare the spectra and everything that is not present in the niose-free version is obviously noise. If you do not know what you are looking for you cannot know what is noise and what is not. Consider this: (a) you want to record speech. The sound of a waterfall nearby is noise. (b) You want to record the waterfall. The sound of the oaf talking nearby is niose. See where I'm aiming at? – MB-F Feb 28 '17 at 15:05
  • In order to identify noise you will have to make assumptions. You can have a template of either or you can use some known properties. For example, the noise may be stationary. In that case, remove any instationarities in the data and what is left is the noise. Or, you may be sure the noise is 1/f, in that case remove anything that does not fit the distribution well. In short, there is no generic way to do this. If there was noise-cancelling headphones would actually work well :) – MB-F Feb 28 '17 at 15:11
  • If I look at the image I am tempted to conclude this series has something like a power law behavior, but to say this you need to prove it has `1/f` noise. But since this is not a signal, how do you process a dataset which you don't know much about? – FaCoffee Feb 28 '17 at 15:13
  • Ah, so you think this is only noise? There you have your assumption! – MB-F Feb 28 '17 at 15:28
  • Ok, so if that's my assumption... that's already noise and there's nothing I need to do. Correct? – FaCoffee Feb 28 '17 at 15:33
  • 1
    That depends, but it certainly is something we can work with. I originally thought there was a useful signal as well as noise in the data and you wanted to extract the noise. If the data is actually noise and you want to know if it really is the kind of noise you think it is, then check out the longer answer I posted below. – MB-F Feb 28 '17 at 15:38
  • So I should change the question title I guess... – FaCoffee Feb 28 '17 at 15:43
  • The title is not ideal, but the question was more misleading :) – MB-F Feb 28 '17 at 15:51
  • Ok, please feel free to suggest a more appropriate question title and I will update it for the benefit of others. Thanks. – FaCoffee Feb 28 '17 at 15:52
  • 1
    I took the liberty to simply edit the question. Please make sure the phrasing is still in your interest and correct (I'm not a native speaker). Oh, and check the updated answer. It really seems to have the spectrum of 1/f noise, except for the DC part, which is tricky anyway. – MB-F Feb 28 '17 at 16:03
  • That's more than OK, thanks. I've read something about the DC part, but since my data do not come from a signal, what does it mean that my spectrum has a peak at 0 Hz? – FaCoffee Feb 28 '17 at 16:07
  • 1
    It means that on average the data is not 0. The DC part is the mean of the data, and it is equivalent to the 0 Hz spectral peak. – MB-F Feb 28 '17 at 16:12

1 Answers1

2

If I look at the image I am tempted to conclude this series has something like a power law behavior, but to say this you need to prove it has 1/f noise.

Let's see if we can verify this assumption:

  1. Build a noise model for 1/f noise: scale / (1 + f ** alpha). This is a mathematical model of 1/f niose with parameters scale (the amplitude) and alpha (describes the ratio of high to low frequencies)

  2. Fit the noise model to the data (in this case fit it to the power spectral density)

  3. Inspect the result. Does it look like the model describes tha data well? If not - try to find a different niose model. But beware of overfitting!

Snip:

from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt


data1 = np.array([1,  0.28571429,  0.20408163,  0.16326531,  0.14285714,
                  0.10204082,  0.08163265,  0.06122449,  0.04081633,  0.02040816])


def one_over_f(f, alpha, scale):
    return scale / f ** alpha    


time_step = 1 / 30
ps1 = np.abs(np.fft.fft(data1)) ** 2
freqs1 = np.fft.fftfreq(data1.size, time_step)

# remove the DC bin because 1/f cannot fit at 0 Hz
ps1 = ps1[freqs1 != 0]
freqs1 = freqs1[freqs1 != 0]

params, _ = curve_fit(one_over_f, np.abs(freqs1), ps1)
ps2 = one_over_f(freqs1, *params)

idx1 = np.argsort(freqs1)

fig, ax = plt.subplots()
ax.plot(freqs1[idx1], ps1[idx1], color='red', label='data1')
ax.plot(freqs1[idx1], ps2[idx1], color='blue', label='fit')
plt.grid(True, which="both")
ax.set_yscale('log')
ax.set_xscale('log')
plt.xlabel("Frequency")
plt.ylabel("Power")
plt.xlim(0,15)
plt.legend()

print('Alpha:', params[0], '\nScale:', params[1])

enter image description here

Visually, the model appears to fit the 1/f spectrum very well. This is not a proof. It is really hard to prove that data has a certain distribution. But you can judge for yourself if a chosen noise model visually fits the data good enough for your needs.

MB-F
  • 22,770
  • 4
  • 61
  • 116