0

I read several posts regarding 1D array clustering in which people are saying that clustering is not suitable for 1D array and that kernel density estimation should be used instead. However, nobody explained how to acctualy perform clustetring by using kde, how to retrieve cluster labels for input data?

In scikit-learn, I got kernel density estimation for my univariate (one-dimensional) data.

kde = KernelDensity(kernel='gaussian', bandwidth=0.75).fit(features)

How can I use it now for clustering, namely, how to retrieve cluster labels for input data?

I was considering two possible approaches: a) To use kde to get new, 2D input data for some clustering estimator (e.g. kmeans). I wanted to retrieve 2D array of data, in form of histogram ([value,frequency]), but I don't know how to do it from kde? Is it possible to use kde as new input dataset for a clustering algorithm, let's say for a kmeans estimator? If yes, how? How can I get a dataset from kde? b) To use kde dirrectly to calculate border between the clusters. In my particular case, I know that there are two clusters and I want to find border between them. And I need to do it computationally, not manually by looking into plot...

zlatko
  • 596
  • 1
  • 6
  • 23
  • Just use X_samples = kde.sample(n_samples=X). Then you can fit kmeans. – sascha Jun 14 '16 at 11:04
  • @sascha, what is "X" in your expression? How do I get it? – zlatko Jun 14 '16 at 15:28
  • You are sampling values from your learned KDE. n_samples is just the number of samples to take. – sascha Jun 14 '16 at 15:39
  • 1
    @sascha and what do you expect to get from that? That approach makes as little sense as this question. Cluszer analysis is about answering a question, not running random functions on random data without a plan. – Has QUIT--Anony-Mousse Jun 17 '16 at 19:56

1 Answers1

1

You don't run a clustering algorithm on a density estimate.

You want to find local minima and maxima in the density to find where to split the data.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194