Wrong result when doing simple Monte Carlo integration in R

Question

I'm giving part of a presentation on numerical integration. While the talk itself will go into better forms of numerical integration (mainly importance sampling and stratified sampling), I'm mentioning during part of my section Monte Carlo integration sampling from the uniform distribution.

I've found that:

mean(sin(runif(1e8, 0, pi)))

is giving an answer of 0.636597, rather than 1 that is expected. This answer seems pretty consistent with increasing sample size, and I'm unsure why there's so much error. Other computations such as:

mean(sin(runif(1e6, 0, 2 * pi)))

give 0.0005398996, much closer to the expected answer of 0.

Can someone help me see why

mean(sin(runif(1e8, 0, pi)))

is giving such an inaccurate answer? Is this user error, or is it to be expected when sampling from the uniform distribution?

@ZheyuanLi In writing out an answer to you, I've realized I've forgotten to multiply the result by the length of the interval, which wouldn't be detected when the integral is expected to be 0. Thanks! — Mark, Dec 05 '16 at 09:43

score 4 · Accepted Answer · edited May 23 '17 at 10:30

4

I came back to make my answer complete, in case future readers need to know the logic. Note, the true value is 2 not 1, as stated in your question.

So, you just computed the mean function values at samples, but forgot to multiply interval length.

set.seed(0); pi * mean(sin(runif(1000, 0, pi)))
# [1] 2.001918

is what you need.

A deterministic view of this result is mean value theorem for integral, or Riemann sum approximation of integral.

So we can also do

pi * mean(sin(seq(0, pi, length = 1000)))
# [1] 1.997998

Monte Carlo integration is more useful via importance sampling. Read Monte Carlo integration using importance sampling given a proposal function for a good example.

edited May 23 '17 at 10:30

Community

1
1

answered Dec 05 '16 at 10:06

Zheyuan Li

71,365
17
180
248

I previously mention various gaussian quadrature methods, so this being non-deterministic is important for transitioning between that and more sophisticated Monte Carlo methods. The speed/accuracy of this code isn't important at all (and being inaccurate for small n may even be good, as it will demonstrate why uniform sampling needs to be improved on). – Mark Dec 05 '16 at 10:25

Wrong result when doing simple Monte Carlo integration in R

1 Answers1

Linked