2

I'm giving part of a presentation on numerical integration. While the talk itself will go into better forms of numerical integration (mainly importance sampling and stratified sampling), I'm mentioning during part of my section Monte Carlo integration sampling from the uniform distribution.

I've found that:

mean(sin(runif(1e8, 0, pi)))

is giving an answer of 0.636597, rather than 1 that is expected. This answer seems pretty consistent with increasing sample size, and I'm unsure why there's so much error. Other computations such as:

mean(sin(runif(1e6, 0, 2 * pi)))

give 0.0005398996, much closer to the expected answer of 0.

Can someone help me see why

mean(sin(runif(1e8, 0, pi)))

is giving such an inaccurate answer? Is this user error, or is it to be expected when sampling from the uniform distribution?

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
Mark
  • 324
  • 3
  • 9
  • 2
    @ZheyuanLi In writing out an answer to you, I've realized I've forgotten to multiply the result by the length of the interval, which wouldn't be detected when the integral is expected to be 0. Thanks! – Mark Dec 05 '16 at 09:43

1 Answers1

4

I came back to make my answer complete, in case future readers need to know the logic. Note, the true value is 2 not 1, as stated in your question.

True value


Monte Carlo

So, you just computed the mean function values at samples, but forgot to multiply interval length.

set.seed(0); pi * mean(sin(runif(1000, 0, pi)))
# [1] 2.001918

is what you need.


A deterministic view of this result is mean value theorem for integral, or Riemann sum approximation of integral.

Riemann

So we can also do

pi * mean(sin(seq(0, pi, length = 1000)))
# [1] 1.997998

Monte Carlo integration is more useful via importance sampling. Read Monte Carlo integration using importance sampling given a proposal function for a good example.

Community
  • 1
  • 1
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • I previously mention various gaussian quadrature methods, so this being non-deterministic is important for transitioning between that and more sophisticated Monte Carlo methods. The speed/accuracy of this code isn't important at all (and being inaccurate for small n may even be good, as it will demonstrate why uniform sampling needs to be improved on). – Mark Dec 05 '16 at 10:25