I have a n x 2 array from flow cytometry data representing forward scatter and side scatter for a cell (there are n cells). These values represent physical characteristics of the cells and I wish to filter the cells.
When plotted as a scatter plot, the data shows a strong elliptical cloud and then there more dispersed cells. I wish to "gate" this data such that I keep the dominant cloud and filter out all the rest (in the image below I would like to retain the dots that are inside the gray elliptical boundary.
What I would like is to get the a binary n x 1 array where the value at index i is 1 if this cell is within the cloud and 0 if not.
I actually don't know how to filter out the data outside the ellipse. But I tried doing K-means specifying 4 clusters. However the dominant cluster was detected as a single group (see figure below).
I need to be able to detect the dominant cluster programatically. I would be grateful if someone can help with this.
The sample data is here
FS_SS.txt (hosted at AnonFiles.com)