27

I've a question about the hypergeometric test.

I've data like this :

pop size : 5260
sample size : 131
Number of items in the pop that are classified as successes : 1998
Number of items in the sample that are classified as successes : 62

To compute a hypergeometric test, is that correct?

phyper(62, 1998, 5260, 131)
zx8754
  • 52,746
  • 12
  • 114
  • 209
Nicolas Rosewick
  • 1,938
  • 4
  • 24
  • 42
  • 1
    Relevant post: [Calculating the probability of gene list overlap between an RNA seq and a ChIP-chip data set](http://stats.stackexchange.com/a/16259/6454) – zx8754 Aug 18 '14 at 10:20

4 Answers4

25

Almost correct. If you look at ?phyper:

phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)

x, q vector of quantiles representing the number of white balls drawn
without replacement from an urn which contains both black and white
balls.

m the number of white balls in the urn.

n the number of black balls in the urn.

k the number of balls drawn from the urn.

So using your data:

phyper(62,1998,5260-1998,131)
[1] 0.989247
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
James
  • 65,548
  • 14
  • 155
  • 193
  • 2
    Is it not phyper(**61**,1998,5260-1998,131) ? – Nicolas Rosewick Dec 06 '11 at 13:18
  • @NicoBxl No, 62 is the number of successes in the sample right? – James Dec 06 '11 at 13:31
  • yes it's 62. But I read somewhere that I have to substract one (slide 20 ) – Nicolas Rosewick Dec 06 '11 at 13:54
  • here : http://www.google.be/url?sa=t&rct=j&q=hypergeometric%20test%20r&source=web&cd=4&ved=0CEQQFjAD&url=http%3A%2F%2Fusers.unimi.it%2Fmarray%2F2007%2Fmaterial%2Fday4%2FLecture7.pdf&ei=ex3eTtf5IY-hOs3StawJ&usg=AFQjCNHLKtqn9mWVudBuPKhKpqPfqq2lFw&sig2=mOEW8v9jDhB_glsGtCchzw – Nicolas Rosewick Dec 06 '11 at 13:54
  • 6
    @NicoBxl I'm not sure what they are trying to compute, or what you are. But `phyper` gives the cumulative probability upto and including your input observation, ie P(Observed 62 or less). If you want P(Observed less than 62) then obviously use 61. If you want *exactly* 62, then use `dhyper` – James Dec 06 '11 at 14:20
21

I think you want to compute p-value. In this case, you want

P(Observed 62 or more) = 1-P(Observed less than 62).

So you want

1.0-phyper(62-1, 1998, 5260-1998, 131)

Note that -1 there in the first parameters. And also you need to subtract that from 1.0 to get the area of the right tail.

Correct me if I'm wrong..

AGS
  • 14,288
  • 5
  • 52
  • 67
Albert
  • 211
  • 2
  • 2
  • 6
    Whether the OP wants the right or left tail will depend on the direction of the alternative hypothesis in the test, which isn't clearly stated in the question. So it could be either. – joran Sep 22 '12 at 20:47
  • 2
    I think it is better to use `lower.tail=FALSE` instead of `1.0-phyper(62-1, 1998, 5260-1998, 131)` – Rachel Rap Aug 08 '21 at 17:51
14

@Albert,

To compute a hypergeometric test, you obtain the same p-value, P(observed 62 or more), using:

> phyper(62-1, 1998, 5260-1998, 131, lower.tail=FALSE)
[1] 0.01697598

Because:

lower.tail: logical; if TRUE (default), probabilities are P[X <= x], 
            otherwise, P[X > x]
Emile Zäkiev
  • 150
  • 1
  • 12
  • 2
    Meng's notes on phyper and fisher.test (which do the same thing, but have a very different interface) are also very helpful: http://mengnote.blogspot.qa/2012/12/calculate-correct-hypergeometric-p.html – Aditya Apr 14 '16 at 05:30
0

I think this test be should be like following:

phyper(62,1998,5260-1998,131-62,lower.tail=FALSE)

Then the sum of all the rows will equal the sum of all the columns. This is important when dealing with contingency tables.

helencrump
  • 1,351
  • 1
  • 18
  • 27