Distribution empirical in R

Question

I have a vector of observations and would like to obtain an empirical p value of each obervation with R. I don't know the underlying distribution, and what I currently do is just

runif(100,0,1000)->ay
quantile(ay)

However, that does not really give me a p-value. How can I obtain a p-value empirically?

This sounds more like you don't know the definition of an empirical p-value. If you have questions about statistics, you should ask at [stats.se]. If this is a programming question, then you should be more specific about what you have and what you've tried and describe how it doesn't work. — MrFlick, Feb 22 '16 at 21:05
@MrFlick - i think just the major thing is i dont know how to do it in R..thats why it is not on cross validate... — Wertw, Feb 22 '16 at 21:10
What "formula" are you using then? Which part don't you know how to do exactly? — MrFlick, Feb 22 '16 at 21:28
@ Mr Flick - so I dont know which commadn in R to use to compute it - it though tabout ecdf but then dont knwo how to get on from there — Wertw, Feb 22 '16 at 21:30
To compute what? What's the calculation you want to do? There is no magic "empirical p-value" function. What type of modeling assumptions are you making in order to find your empirical p-value? These are statistical questions, not programming questions. — MrFlick, Feb 22 '16 at 21:32
- so I mean you clearly know how to tackle the problem - would it be somehow feasible that you give advice? — Wertw, Feb 22 '16 at 21:34
I think @MrFlick is trying the socratic method. For a starting point, we generally use p-values as an indicator of extremity of values/test-results (H1/H0, which off course you know). For what test/idea/assumption/comparison do you want to see a p-value? — Heroka, Feb 22 '16 at 21:46

score 1 · Answer 1 · answered Feb 22 '16 at 20:56

1

I think this is what you're looking for:

rank(ay)/length(ay)

answered Feb 22 '16 at 20:56

TBSRounder

348
1
9

could you maybe explain the reasoning, please? – Wertw Feb 22 '16 at 21:09
rank(ay) determines where it lies on the distribution (1 is smallest value), then divide by the number of observations length(ay). It basically finds how many observations are <= each current observations. – TBSRounder Feb 22 '16 at 21:15
is that a statistically reasonable approach to determine a pvalue? really? without bootstrap ? – Wertw Feb 22 '16 at 21:17
Not sure what p-value you're trying to calculate, but this is one way to do it in R, though it looks like this is not what you are looking for. – TBSRounder Feb 22 '16 at 21:34
well can you do that if you dont knwo how the distribution which formed the data is? Is it like a general prozimal way to obtain the value? – Wertw Feb 22 '16 at 21:39

score 1 · Answer 2 · answered Feb 22 '16 at 21:40

1

I think what you want is the ecdf function. This returns an empirical cumulative distribution function, which you can apply directly

ay <- runif(100)
aycdf <- ecdf(ay)

And then

> aycdf(c(.1, .5, .7))
[1] 0.09 0.51 0.73

answered Feb 22 '16 at 21:40

user295691

7,108
1
26
35

Hi could you explain the last line , please? so you get the ecdf and then you feed values to get the p value of 0.5 and 0.7, correct? – Wertw Feb 22 '16 at 21:56
That's exactly it; it evaluates the CDF at those points to get, in probability terms, the empirical probability `P(X > .5)`, using a step function. For large enough of a sample from `runif`, this will create a function very similar to `punif`. – user295691 Feb 23 '16 at 14:31

Distribution empirical in R

2 Answers2