1

I have some observations from an unknown source. This set of observations is x, for example :

x = [97 , 102.3, 95.05 , 89.1 , 117 , ...]; % this is just an example. data set could contain any thing.

provided x is large enough, I should be able to say something about the probability distribution function, right?

So how can I do this in MATLAB so I can get p(x = 101) or p(x = 5) ? the first one will probably be very high.

Any kind of assumption (normal distribution etc.) is ok, I just want a simple answer for probabilities. And maybe I don't have to explicitly know the PDF, I just need a way to implement p(x = x_star), where x_star is not necessarily a member of x. How can I do this?

Thanks for any help !

My Attempts

The simplest attempt is length(find(x==x_star))/length(x), however this returns zero if for example there is no 101 in the observations. However looking at the distribution it should be a high probability.

Edit :

My function according to Kamtal's answer :

function p = get_probability_from_sample_set(S, X)
% finds the probability that a sample from S is equal to X
[mu,sigma] = normfit(S);
 z = 1:200;
 xfit = normpdf(z,mu,sigma);
 p = xfit(find(z == X)); 
end

p returns []. Where am I doing wrong?

jeff
  • 13,055
  • 29
  • 78
  • 136

1 Answers1

0
 x = randi(200,[1000 1]);
 [mu,sigma] = normfit(x);
 z = 1:200;
 xfit = normpdf(z,mu,sigma);
 p = xfit(find(z == round(X)));

If your values are in [0 0.1],

 x = randi(1000,[1000 1])/10000;
 [mu,sigma] = normfit(x);
 z = 0:1e-5:0.1;
 xfit = normpdf(z,mu,sigma);
 nearestToz = z(abs(z - X) == min(abs(z - X)));
 p = xfit(find(z == nearestToz));
Rashid
  • 4,326
  • 2
  • 29
  • 54
  • Thanks ! This looks right, but does this assume integer values? Because it was just an example, my actual values are floats, and they change from order of 1e-3 to 1e3. Will this work for all types of x and x_star ? – jeff Nov 01 '14 at 19:46
  • @halilpazarlama, Since `z = 1:200;` it will give integers, if you change `z = 1:stepsize:200;` you could have access to floats, depending on your data. – Rashid Nov 01 '14 at 19:48
  • Oh so it should be `z = min(S):stepsize:max(S)` ? Please see the edit to the question. – jeff Nov 01 '14 at 19:50
  • @halilpazarlama, yes. you can plot `xfit` to see the pdf. – Rashid Nov 01 '14 at 19:51
  • Ok so does `p = xfit(..)` make sense? Or how do I get the probability? – jeff Nov 01 '14 at 19:53
  • @halilpazarlama, `p=xfit(find(z == x_star))`, and try `z=unique(x);`, I think that is more efficient. – Rashid Nov 01 '14 at 19:55
  • Ok but this still gives zero for queries that are not in the sample set, right? – jeff Nov 01 '14 at 19:57
  • @halilpazarlama, I forgot that you want probabilities for data that aren't in set. you have to use `z = min(S):stepsize:max(S)` with a small step to cover all the values you want their probability. – Rashid Nov 01 '14 at 19:59
  • Ok thanks. This works if query is in z. But still not for all values, except stepsize goes to zero (which fails the memory). I still think that there should be a way that works for **every** query. – jeff Nov 01 '14 at 20:02
  • @halilpazarlama, Do you have your queries in an array? so we could somehow insert them all in `z`, or they are random? – Rashid Nov 01 '14 at 20:03
  • No, they are not pre-determined. I want to make a system that gives a probability for any given query. So we can say that the query set is the set of real numbers. **edit** : Maybe we can use the "closest" member of z. – jeff Nov 01 '14 at 20:05
  • @halilpazarlama, you could also use `p = xfit(find(z == round(X));`. They will almost be the same anyway. – Rashid Nov 01 '14 at 20:06
  • no, neither the query set nor the data set is limited to integers. – jeff Nov 01 '14 at 20:08
  • @halilpazarlama, Ok that is the reason to use `round`. What would be the problem? – Rashid Nov 01 '14 at 20:09
  • Hmm. What if my data set is between 0 and 0.1? – jeff Nov 01 '14 at 20:17