I have a sensor that output data consist of one attribute (mono value). An example of punch of sequenced data is as follows:
sample: 199 200 205 209 217 224 239 498 573 583 583 590 591 594 703 710 711 717 719 721 836 840 845 849 855 855 856 857 858 858 928 935 936 936 942 943 964 977
You can see the data from the first image input.
The data is divided into levels. The number of levels is given for me (5 levels in this example). However, the number of samples for each level is unknown, as well as the distances between the levels are also unknown.
I need to exclude the outliers and define the center of each level (look at the second image output.
The red samples represent the outliers and the yellow represent the centers of levels). Is there any algorithm, mathematical formula, c++ code may help me to achieve this requirement?
I tried KMeans (with K = 5 in this example) and I got bad result because of the random initial K centroids. Most time some inintial centroids share the same level that let this level become two clusters, whereas other two levels belongs to one cluster. If I set the initial centroids manually by selecting one centroid from each level I get very good results.