-4

I have a vector of observations and would like to obtain an empirical p value of each obervation with R. I don't know the underlying distribution, and what I currently do is just

runif(100,0,1000)->ay
quantile(ay)

However, that does not really give me a p-value. How can I obtain a p-value empirically?

Litty
  • 1,856
  • 1
  • 16
  • 35
Wertw
  • 1
  • 1
  • 2
  • 2
    This sounds more like you don't know the definition of an empirical p-value. If you have questions about statistics, you should ask at [stats.se]. If this is a programming question, then you should be more specific about what you have and what you've tried and describe how it doesn't work. – MrFlick Feb 22 '16 at 21:05
  • 1
    @MrFlick - i think just the major thing is i dont know how to do it in R..thats why it is not on cross validate... – Wertw Feb 22 '16 at 21:10
  • What "formula" are you using then? Which part don't you know how to do exactly? – MrFlick Feb 22 '16 at 21:28
  • @ Mr Flick - so I dont know which commadn in R to use to compute it - it though tabout ecdf but then dont knwo how to get on from there – Wertw Feb 22 '16 at 21:30
  • 1
    To compute what? What's the calculation you want to do? There is no magic "empirical p-value" function. What type of modeling assumptions are you making in order to find your empirical p-value? These are statistical questions, not programming questions. – MrFlick Feb 22 '16 at 21:32
  • - so I mean you clearly know how to tackle the problem - would it be somehow feasible that you give advice? – Wertw Feb 22 '16 at 21:34
  • I think @MrFlick is trying the socratic method. For a starting point, we generally use p-values as an indicator of extremity of values/test-results (H1/H0, which off course you know). For what test/idea/assumption/comparison do you want to see a p-value? – Heroka Feb 22 '16 at 21:46

2 Answers2

1

I think this is what you're looking for:

rank(ay)/length(ay)
TBSRounder
  • 348
  • 1
  • 9
  • could you maybe explain the reasoning, please? – Wertw Feb 22 '16 at 21:09
  • rank(ay) determines where it lies on the distribution (1 is smallest value), then divide by the number of observations length(ay). It basically finds how many observations are <= each current observations. – TBSRounder Feb 22 '16 at 21:15
  • is that a statistically reasonable approach to determine a pvalue? really? without bootstrap ? – Wertw Feb 22 '16 at 21:17
  • Not sure what p-value you're trying to calculate, but this is one way to do it in R, though it looks like this is not what you are looking for. – TBSRounder Feb 22 '16 at 21:34
  • well can you do that if you dont knwo how the distribution which formed the data is? Is it like a general prozimal way to obtain the value? – Wertw Feb 22 '16 at 21:39
1

I think what you want is the ecdf function. This returns an empirical cumulative distribution function, which you can apply directly

ay <- runif(100)
aycdf <- ecdf(ay)

And then

> aycdf(c(.1, .5, .7))
[1] 0.09 0.51 0.73
user295691
  • 7,108
  • 1
  • 26
  • 35
  • Hi could you explain the last line , please? so you get the ecdf and then you feed values to get the p value of 0.5 and 0.7, correct? – Wertw Feb 22 '16 at 21:56
  • That's exactly it; it evaluates the CDF at those points to get, in probability terms, the empirical probability `P(X > .5)`, using a step function. For large enough of a sample from `runif`, this will create a function very similar to `punif`. – user295691 Feb 23 '16 at 14:31