-1

I have generated a data set in matlab then some outliers embedding in the data. I would like to plot it and since I'm new in matlab I don't know how to specify the outliers from inliers by different sign or different color. The points which are outlyingness with respect to the x axis, y axis and both of them. This is the matlab codes for that;

pd = makedist('Normal');
rng(38)
a = random(pd,100,1);
b  = datasample(1:100,40,'Replace',false);
pd1 = makedist('Normal','mu',10*sqrt(2),'sigma',0.1);

a(b)=random(pd1,40,1);
a=reshape(a,[50,2]);
plot(a(:,1),a(:,2),'O') 

I would be appreciated if you could help me.

  • You'd need to specify what points are outliers separately. I ran your example and in my opinion there wasn't any outliers, everything is fairly well clustered. – IKavanagh Sep 23 '15 at 13:00
  • @IKavanagh a(b)=random(pd1,40,1) are outliers. I want to specify this points with tree different color or sign. Can you help me. – user2802663 Sep 23 '15 at 13:06
  • Due to less reputation I cant put my figure! – user2802663 Sep 23 '15 at 13:10
  • Some of the `a(b)` aren't outliers though, they have similar values to `a` and are within clusters of `a`. That is the nature of random numbers. Do you mean something like `a > 2 && a < -2` are outliers? You can paste a link to your figure and somebody else can embed it. – IKavanagh Sep 23 '15 at 13:15
  • How can I paste a link to my figure? – user2802663 Sep 23 '15 at 13:19

2 Answers2

0

In this example I assumed that the points which distance along OX axis is greater than 3 are outliers and marked them red (whereas normal points are marked blue):

centroid = mean(a);
distx = a(:,1) - centroid(1);
disty = a(:,2) - centroid(2);

outliers_x = distx > 3;

plot(centroid(1), centroid(2), 'xk')
hold on 
plot(a(outliers_x,1),a(outliers_x,2),'or')
plot(a(~outliers_x,1),a(~outliers_x,2),'ob')
hold off

Note that I've also displayed the centroid as a black "X" mark.

hold on/hold off are used to "stack" several plots (or images) together You may want to read hold() reference. Also here you'll find which markers and colors are available.

Paweł Kłeczek
  • 603
  • 1
  • 5
  • 28
  • Thanks for the answer, but could you please specify the points that are far from the mass of data in y,x axis and in both of them. If you run the example you will see which of them I mean. – user2802663 Sep 23 '15 at 13:23
  • Ok, I updated the answer so that it includes computation of the centroid as well as one constraint on distance from an axis. Is it clear for you now, how to filter out the desired outliers? – Paweł Kłeczek Sep 23 '15 at 13:49
  • @ Paweł Kłeczek thank you for the answer, maybe I could not explain well what I really want. Actually I want to show that the observations in which **a(: ,1) >10 as red** and **a(: ,2) >10 as green** and the observations in which **a(: ,1) and a(: ,2) simultaneously are greater than 10 as black** and the rest of observation(that are their values in a(: ,1), a(: ,2)separately and a(: ,1) and a(: ,2) simultaneously are less than 10) as blue. – user2802663 Sep 27 '15 at 05:47
0

To answer to my question I have written the following codes, in order to specify 4 groups of observations with different color.

pd = makedist('Normal');
rng(38)
a = random(pd,100,1);
b  = datasample(1:100,40,'Replace',false);
pd1 = makedist('Normal','mu',10*sqrt(2),'sigma',0.1);

a(b)=random(pd1,40,1);
a=reshape(a,[50,2]); 

hold all;
aa=(a >= 10 | a >= 10);
rep=repmat(0, 1, 50);
aaa=[rep',aa];
n=50;
for i=1:n; plot(a(i,1),a(i,2),'o','col',aaa(i,:));
end