I'm currently working on some seasonality estimation for a time series dataset.
What I get is a dataset of possible frequencies/periods that might occur in the dataset. Thus, this is somewhat noisy (e.g. having some periods as [100, 98, 101, 102] that actually should be "the same").
For estimating sharp periods, I try to estimate peaks via kernel density estimation (kde, sklearn.neighbors.KernelDensity) as follows:
import numpy as np
from sklearn.neighbors import KernelDensity
from scipy import signal
import matplotlib.pyplot as plt
X1 = np.random.randint(1, 4, 20)
X2 = np.random.randint(10, 13, 200)
X = np.concatenate((X1, X2), axis=0)
# the peaks schould be at 2 and 11!
bw = 1
kde = KernelDensity(kernel='gaussian', bandwidth=bw).fit(X.reshape(-1, 1))
estimator = np.linspace(0, 15, 100)
kde_est = np.exp(kde.score_samples(estimator.reshape(-1, 1)))
plt.plot(estimator, kde_est)
peaks_pos = signal.argrelextrema(kde_est, np.greater)[0]
print(estimator[peaks_pos])
# the peaks are at around 2 and 11!
Additionally, I'd like to know how the kernels for this estimation look like. For the gaussian case, there should be a set of /mu and /sigma should be available for all [default] 40 kernels. Can I access this information? I could not find a clue in the documentation or the details of the kde attributes. But I'm pretty sure, this should be available somehere.
For clarification, why I need this:
In the following example, the 2 peaks are too close together to be found, but I'm sure the kernels would show up.
X1 = np.random.randint(1, 4, 20)
X2 = np.random.randint(5, 8, 200)
X = np.concatenate((X1, X2), axis=0)
# the peaks schould be at 2 and 6!
bw = 1
kde = KernelDensity(kernel='gaussian', bandwidth=bw).fit(X.reshape(-1, 1))
estimator = np.linspace(0, 15, 100)
kde_est = np.exp(kde.score_samples(estimator.reshape(-1, 1)))
plt.plot(estimator, kde_est)
peaks_pos = signal.argrelextrema(kde_est, np.greater)[0]
print(estimator[peaks_pos])
# the peaks are at around 6 and sometimes 2!