0

I have a vector with approximately 1000 elements from roughly 1 to 10 and I want the product of all elements multiplied together modulo 10^6. I.e., I want something akin to

x <- as.integer(runif(1000, 1, 10))
prod(x) %% 1000000

However, since prod(x) evaluates to Inf, the above simply returns NaN.

In my actual application (which isn't exactly minimal, so I won't put it here), I've done this with a for loop, like

x <- as.integer(runif(1000, 1, 10))

for (i in 1:length(x)) {
  if (i == 1) {
    y <- x[i]
  } else {
    y <- y * x[i]
    y <- y %% 1000000
  }
}
y

Initially I simply wanted to ask for help coming up with a better solution than a for loop, assuming I should be avoiding this in R. However, in writing this question I'm now equally puzzled by the fact that I can't seem to generate a working MWE for the for loop. The above code returns 0, yet it works fine when I remove as.integer(). I've spent longer than I care to admit trying to figure out why this is so, but I'm stumped.

Two questions, then:

  • Why doesn't the above MWE work? (And why does it work without as.integer?)
  • What's a good way to implement modular multiplication in R that avoids overflow issues? Ideally I'd be curious for a solution in base R, i.e. without something like the gmp package. I've tried with sapply/lapply, but to no avail so far.

I've found similar questions posed in terms of other languages, but I struggle to decode the answers phrased in these terms, I'm afraid.

MSR
  • 3
  • 1

1 Answers1

0

Let's check what happens:

set.seed(42)
x <- as.integer(runif(1000, 1, 10))

y <- integer(1000)
for (i in 1:length(x)) {
  if (i == 1) {
    y[i] <- x[i]
  } else {
    y[i] <- y[i-1] * x[i]
    y[i] <- y[i] %% 1000000
  }
}
y[1:30]
# [1]      9     81    243   1944  11664  58320 408240 816480 898880 292160 460800 225600  30400  91200 456000 104000 936000
#[18] 872000 360000 160000 440000 880000 920000 280000 280000 400000 600000 400000      0      0

OK, so the problem appears in the 29th iteration:

400000 * x[29]
#[1] 2e+06

Why does it not happen with floating point numbers?

set.seed(42)
x <- (runif(1000, 1, 10))

y <- integer(1000)
for (i in 1:length(x)) {
  if (i == 1) {
    y[i] <- x[i]
  } else {
    y[i] <- y[i-1] * x[i]
    y[i] <- y[i] %% 1000000
  }
}
y[1:30]
 #[1] 9.233254e+00 8.710356e+01 3.114175e+02 2.638961e+03 1.788083e+04 1.014176e+05 7.737451e+05 7.115236e+05 9.187133e+05
#[10] 7.484847e+05 8.319994e+05 2.167078e+05 3.966498e+04 1.308492e+05 6.752649e+05 3.880944e+05 8.048925e+05 6.559748e+05
#[19] 4.602498e+05 7.812873e+05 1.380611e+05 3.104154e+05 7.312040e+04 6.961072e+05 2.125757e+05 1.963559e+05 8.859247e+05
#[28] 1.076669e+05 5.407815e+05 6.096419e+05

As you see, you never get a multiple of 1e6.

Note that your loop does not the same as the vectorized attempt since you calculate the modulo in each iteration.

You could look into using arbitrary precision numbers as an alternative (see package Rmpfr). Or use a smarter algorithm.

Roland
  • 127,288
  • 10
  • 191
  • 288