Version of R random number generator impacts correctness of edx Statistics and R course answers

Question

I'm sorry for asking this here but there is no discussion page for this course on the website and it mentions stackoverflow to ask any questions. This is from this edx course.

Q1: Using the following dataset:

'''
url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/babies.txt"
filename <- basename(url)
download(url, destfile=filename)
babies <- read.table("babies.txt", header=TRUE)
'''

splitting into two groups (non-smoking and smoking):

bwt.nonsmoke <- filter(babies, smoke==0) %>% select(bwt) %>% unlist 
bwt.smoke <- filter(babies, smoke==1) %>% select(bwt) %>% unlist

Set the seed at 1 and obtain a samples from the non-smoking mothers (dat.ns) of size N=25. Then, without resetting the seed, take a sample of the same size from and smoking mothers (dat.s). Compute the t-statistic (call it tval).

What is the absolute value of the t-statistic?

Here's how I did it:

set.seed(1)
dat.ns <- sample(bwt.nonsmoke,25)
dat.s <- sample(bwt.smoke,25)
tval <- t.test(dat.ns,dat.s)$statistic
tval

This gives the value 2.120904 which is apparently wrong. I also tried setting the seed to 1 before each sample as follows:

set.seed(1)
dat.ns <- sample(bwt.nonsmoke,25)
set.seed(1)
dat.s <- sample(bwt.smoke,25)
tval <- t.test(dat.ns,dat.s)$statistic
tval

which gives the t value of 1.573627 which is also wrong. I'm not sure what I'm doing wrong and I'd like some help.

Welcome to stack overflow. How do you know both of those answers are wrong? — Mark Neal, Apr 19 '20 at 19:04
Thanks, the course is set up so that it checks whether you have entered the correct answer by matching it with the answer key set by the instructor. — Advait Bhagwat, Apr 20 '20 at 12:30

Len Greski · Accepted Answer · 2020-04-19T19:19:08.273

The random number generator in R changed significantly at R version 3.6.0, as highlighted in an R Bloggers article, What's new in R 3.6.0?

If you're using a pre 3.6.0 version of R, you'll get the following t-test statistic based on your code:

> RNGversion("3.5.3")
Warning message:
In RNGkind("Mersenne-Twister", "Inversion", "Rounding") :
  non-uniform 'Rounding' sampler used
> set.seed(1)
> dat.ns <- sample(bwt.nonsmoke,25)
> dat.s <- sample(bwt.smoke,25)
> t.test(dat.ns,dat.s)

    Welch Two Sample t-test

data:  dat.ns and dat.s
t = 2.1209, df = 47.693, p-value = 0.03916
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  0.5141953 19.3258047
sample estimates:
mean of x mean of y 
   124.68    114.76

If you use R 3.6.0 or newer, you get the following answer with the same code.

> # redo with RNVversion(3.6.3)
> RNGversion("3.6.3")
> set.seed(1)
> dat.ns <- sample(bwt.nonsmoke,25)
> dat.s <- sample(bwt.smoke,25)
> t.test(dat.ns,dat.s)

    Welch Two Sample t-test

data:  dat.ns and dat.s
t = 1.6593, df = 47.58, p-value = 0.1036
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.79772 18.75772
sample estimates:
mean of x mean of y 
   125.12    116.64

Bottom line: check the version of R that was used to create the quiz answers to confirm the version of the random number generator.

Thanks, using `RNGversion("3.6.3")` gave me the correct answer. Although I'm not sure why it was using the old RNG version even though the R version I have installed is version 3.6.3. Is there a way to change the default to the new version instead? — Advait Bhagwat, Apr 20 '20 at 12:28
@AdvaitBhagwat - Yes, you can change your .Rprofile file to always use `RNGversion('3.6.3')`. Instructions explaining how to do this are in the *R Bloggers* article [Fun with .Rprofile and customizing R startup](https://www.r-bloggers.com/fun-with-rprofile-and-customizing-r-startup/). — Len Greski, Apr 20 '20 at 21:26

Version of R random number generator impacts correctness of edx Statistics and R course answers

1 Answers1