2

this is my problem:

I have the next data "A", which looks like:

enter image description here

As you can see, I have drawn with red circles the apparently peaks, the most defined are 2 and 7, I say that they are defined because its standard deviation is low in comparison with the other peaks (especially the second one).

What I need is a way (anyway) to get the values and the standard deviation of n peaks in a numeric array.

I have tried with "clusters", but I got no good results:

enter image description here

First of all, I used "kmeans" MATLAB function, and I realize that this algorithm doesn't group peaks as I need. As you can see in the picture above, in the red circle, that cluster has at less 3 or 4 peaks. And kmeans need that you set the number of clusters, and I need to identify it automatically.

I hope that anyone can give me some ideas, or a way to get better results, thanks.

Pd: I leave the data "A" in the next link.

https://drive.google.com/file/d/0B4WGV21GqSL5a2EyQ2l0SHZURzA/edit?usp=sharing

lisandrojim
  • 509
  • 5
  • 18
  • the data which you have posted has only one peak with a relatively higher standard deviation that what you have shown – Autonomous Jul 14 '14 at 18:01
  • My apologies, I have changed the file for the correct one. – lisandrojim Jul 14 '14 at 19:22
  • run these commands on your data: `[pks,locs]=findpeaks(A(:,2),'threshold',0.15); scatter(A(:,1),A(:,2)) hold on; scatter(A(locs,1),A(locs,2),'ro','filled')` as you would see, by no means they are perfect peaks, but a good point to start clustering. I took these points and gave to kmeans as starting points, but that result was worse. – Autonomous Jul 14 '14 at 20:45

1 Answers1

0

The problem is that your axes have very different meaning.

K-means optimizes variance. But variance in X is something entirely different than variance in Y, isn't it? Furthermore, each of these methods will split your data in both X and Y, whereas I assume you want the data to be partitioned on the X axis only.

I suggest the following: consider the Y axis to be a weight, and X axis to be a position.

Then perform weighted density estimation, and look for low density to separate your clusters.

I can't help you with MATLAB. I don't use it.

Mathematically, what you want to do is place a Gaussian at each point, with area Y and center X. Then find minima and maxima on the sum of these Gaussians. See Wikipedia, Kernel Density Estimation for details; except that you want to use the Y axis as weights. You could maybe also use 1/Y as standard deviation, if you don't want to use weights.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194