Hypergeometric test (phyper)

Question

I've a question about the hypergeometric test.

I've data like this :

pop size : 5260
sample size : 131
Number of items in the pop that are classified as successes : 1998
Number of items in the sample that are classified as successes : 62

To compute a hypergeometric test, is that correct?

phyper(62, 1998, 5260, 131)

Relevant post: [Calculating the probability of gene list overlap between an RNA seq and a ChIP-chip data set](http://stats.stackexchange.com/a/16259/6454) — zx8754, Aug 18 '14 at 10:20

score 25 · Accepted Answer · edited Mar 27 '18 at 08:09

25

Almost correct. If you look at ?phyper:

phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)

x, q vector of quantiles representing the number of white balls drawn
without replacement from an urn which contains both black and white
balls.

m the number of white balls in the urn.

n the number of black balls in the urn.

k the number of balls drawn from the urn.

So using your data:

phyper(62,1998,5260-1998,131)
[1] 0.989247

edited Mar 27 '18 at 08:09

Roman Luštrik

69,533
24
154
197

answered Dec 05 '11 at 10:44

James

65,548
14
155
193

2

Is it not phyper(**61**,1998,5260-1998,131) ? – Nicolas Rosewick Dec 06 '11 at 13:18
@NicoBxl No, 62 is the number of successes in the sample right? – James Dec 06 '11 at 13:31
yes it's 62. But I read somewhere that I have to substract one (slide 20 ) – Nicolas Rosewick Dec 06 '11 at 13:54
here : http://www.google.be/url?sa=t&rct=j&q=hypergeometric%20test%20r&source=web&cd=4&ved=0CEQQFjAD&url=http%3A%2F%2Fusers.unimi.it%2Fmarray%2F2007%2Fmaterial%2Fday4%2FLecture7.pdf&ei=ex3eTtf5IY-hOs3StawJ&usg=AFQjCNHLKtqn9mWVudBuPKhKpqPfqq2lFw&sig2=mOEW8v9jDhB_glsGtCchzw – Nicolas Rosewick Dec 06 '11 at 13:54
6

@NicoBxl I'm not sure what they are trying to compute, or what you are. But `phyper` gives the cumulative probability upto and including your input observation, ie P(Observed 62 or less). If you want P(Observed less than 62) then obviously use 61. If you want *exactly* 62, then use `dhyper` – James Dec 06 '11 at 14:20

score 21 · Answer 2 · edited Sep 22 '12 at 13:07

21

I think you want to compute p-value. In this case, you want

P(Observed 62 or more) = 1-P(Observed less than 62).

So you want

1.0-phyper(62-1, 1998, 5260-1998, 131)

Note that -1 there in the first parameters. And also you need to subtract that from 1.0 to get the area of the right tail.

Correct me if I'm wrong..

edited Sep 22 '12 at 13:07

AGS

14,288
5
52
67

answered Sep 10 '12 at 05:59

Albert

211
2
2

6

Whether the OP wants the right or left tail will depend on the direction of the alternative hypothesis in the test, which isn't clearly stated in the question. So it could be either. – joran Sep 22 '12 at 20:47
2

I think it is better to use `lower.tail=FALSE` instead of `1.0-phyper(62-1, 1998, 5260-1998, 131)` – Rachel Rap Aug 08 '21 at 17:51

score 14 · Answer 3 · edited Oct 08 '19 at 19:21

14

@Albert,

To compute a hypergeometric test, you obtain the same p-value, P(observed 62 or more), using:

> phyper(62-1, 1998, 5260-1998, 131, lower.tail=FALSE)
[1] 0.01697598

Because:

lower.tail: logical; if TRUE (default), probabilities are P[X <= x], 
            otherwise, P[X > x]

edited Oct 08 '19 at 19:21

Emile Zäkiev

150
1
12

answered May 28 '14 at 14:24

Frédéric Bigey

139
1
4

2

Meng's notes on phyper and fisher.test (which do the same thing, but have a very different interface) are also very helpful: http://mengnote.blogspot.qa/2012/12/calculate-correct-hypergeometric-p.html – Aditya Apr 14 '16 at 05:30

score 0 · Answer 4 · edited Nov 05 '15 at 21:42

0

I think this test be should be like following:

phyper(62,1998,5260-1998,131-62,lower.tail=FALSE)

Then the sum of all the rows will equal the sum of all the columns. This is important when dealing with contingency tables.

edited Nov 05 '15 at 21:42

helencrump

1,351
1
18
27

answered Nov 05 '15 at 20:43

user5531047

1

Hypergeometric test (phyper)

4 Answers4

Linked