0

I am trying to get the empirical distribution of two different series, p and q.

I used the syntax [f1,x]=ecdf(p) and [f2,x]=ecdf(q). Although these are two completely different series, they produce the same values for f1 and f2. I guess it is because of the matlab generated node points, x which is chosen by default and is same for the two series. What is the correct way of generating ecdf?

With p and q defined as follow:

p=[3.827880237 3.843230114 3.832979798 3.814851094 3.798070125 3.793802374 3.790420184 3.758288905 3.703854270 3.699633917 3.722435113 3.685122405 3.671987586 3.677439264 3.673511977 3.706842154 3.69299597];

q=[3.832763324 3.848230872 3.835789699 3.819249605 3.802654468 3.801538272 3.800867956 3.763986927 3.711618941 3.703275334 3.744550651 3.688129173 3.673511977 3.681603045 3.679081612 3.716737782 3.702782359];
R.Falque
  • 904
  • 8
  • 29
indu mann
  • 63
  • 8

1 Answers1

1

You overwrite your x that are different, if you use the following code you will have different curves.

[f1,x1]=ecdf(p);
[f2,x2]=ecdf(q);

figure; plot(f1, x1, 'b', f2, x2, 'r');

The cumulative distribution are different as expected:

enter image description here

R.Falque
  • 904
  • 8
  • 29
  • x1 and x2 are different. But will the f1 and f2 be same? I am getting same values for f1 and f2 – indu mann Mar 05 '16 at 07:51
  • I am going to further use the values of f1 and f2 in a copula to generate multivariate distribution. SO its important to generate different values. Any help is appreciated. – indu mann Mar 05 '16 at 08:20
  • If I am not wrong, f is your cumulative probability. Since you have just one sample of each number in your array f1 and f2 are exactly linear from 0 to 1 with a step equal to one divided by the size of your vector. – R.Falque Mar 05 '16 at 14:11
  • that means for the same sample size, whatever be the data, f's will always be the same. – indu mann Mar 05 '16 at 19:41
  • Not necessarily, it's all about the frequency of the sample – R.Falque Mar 06 '16 at 00:09