3

I've been running into some weird problems when using this code:

positions<-c(58256)
occurrencies<-c(30)
frequency<-c(11/5531777)
length<-c(4)

prob<-c(0)
for(i in 0:(occurrencies-1))
{
  pow<-frequency^i
  pow1<-(1-frequency)^(positions-i)
  bin<-choose(positions, i)
  prob<<-prob+(bin*pow*pow1)
}

Each iteration of this for loop should calculate the binomial probability that, i number of occurrences of the event occur given the frequency. Each iteration also sums up the result. This should result in the prob variable never exceeding 1, but after 7 or so for loop iterations, everything goes to hell and prob excedes 1.

I thought it might be a question of precision digits, so i tried using Rmpfr but to no avail- the same problem persisted.

I was wondering if there are any tips or packages to overcome this situation, or if I'm stuck with this.

David Cain
  • 16,484
  • 14
  • 65
  • 75
Tito Candelli
  • 217
  • 1
  • 10
  • I copy/pasted your code, and while it's true that `prob` ends up greater than 1, it's not by much. `abs(prob - 1) < 1e-10` returns `TRUE`. I can get an additional digit by changing the last line to `prob <- prob + exp(log(bin) + log(pow) + log(pow1))` (note the `<<-` isn't necessary unless you're wrapping this in a function and don't want to `return(prob)`), but if you want more than that, just use `pbinom` as Ben suggests. – Gregor Thomas Oct 11 '12 at 17:16
  • 1
    Side issue: when you used `Rmpfr` , what precision level did you select, and did you make sure you calculated your `frequency` as the ratio of `class:mpfr` numbers? – Carl Witthoft Oct 11 '12 at 17:58
  • @shujaa: it is true that it eds up being not much greater than 1, but the thing that troubles me is that after a certain number of cycles of the for loop, "prob" remains the same even if something is being added: try running the for loop from 0 to occurrencies-3 to check that this is true. as for the side issue: i haven't done that, frequency is actualy a Rmpfr number originating from the ratio of 2 normal numbers. i'll try and keep you posted. – Tito Candelli Oct 11 '12 at 19:16

2 Answers2

3

You can avoid your for loop by doing

prob<-0
i    <- 0:(occurrencies-1)
pow  <- frequency^i
pow1 <- (1-frequency)^(positions-i)
bin  <- choose(positions, i)
prob <- cumsum(prob+(bin*pow*pow1))
[1] 0.8906152 0.9937867 0.9997624 0.9999932 0.9999998 1.0000000 1.0000000 1.0000000 1.0000000
[10] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
[19] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
[28] 1.0000000 1.0000000 1.0000000

I don't know if this is your desired result, but surely you can avoid the for loop going this fashion.

See @Ben Bolker's comment and take a look at pbinom function.

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • this is exactly what i'm trying to avoid. i didn't know about the cumsum function and i welcome the tip; however, i am trying to compute the probability that, given the frequency, the event occurs less than "occurrencies" times, so that should never be 1. what i actually need is the probability that the event occurs "occurrencies" or more times, but calculating those numbers is just crazy so i opted to calculate the reciprocal. – Tito Candelli Oct 11 '12 at 19:24
  • 1
    @user1738815 It sounds like what you're looking for is `pbinom(q = occurencies, size = positions, prob = frequency, lower.tail = FALSE)` – Gregor Thomas Oct 11 '12 at 23:43
  • Thank you so much @shujaa! that's exactly what i needed! how do i select your answer as the correct one? i can only upvote – Tito Candelli Oct 12 '12 at 09:17
  • 1
    I'll add it as an answer, then it will be acceptable. But it really is just echoing Ben Bolker. For future reference, we could only help you when you told us "what I actually need" in the comment above. – Gregor Thomas Oct 12 '12 at 17:23
  • I'll keep that in mind for future questions. Thank you again. – Tito Candelli Oct 13 '12 at 08:56
2

Following Ben Bolker's advice to see ?pbinom

pbinom(q = occurencies, size = positions, prob = frequency, lower.tail = FALSE)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294