1

I'm sorry for asking this here but there is no discussion page for this course on the website and it mentions stackoverflow to ask any questions. This is from this edx course.

Q1: Using the following dataset:

'''
url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/babies.txt"
filename <- basename(url)
download(url, destfile=filename)
babies <- read.table("babies.txt", header=TRUE)
'''

splitting into two groups (non-smoking and smoking):

bwt.nonsmoke <- filter(babies, smoke==0) %>% select(bwt) %>% unlist 
bwt.smoke <- filter(babies, smoke==1) %>% select(bwt) %>% unlist

Set the seed at 1 and obtain a samples from the non-smoking mothers (dat.ns) of size N=25. Then, without resetting the seed, take a sample of the same size from and smoking mothers (dat.s). Compute the t-statistic (call it tval).

What is the absolute value of the t-statistic?

Here's how I did it:

set.seed(1)
dat.ns <- sample(bwt.nonsmoke,25)
dat.s <- sample(bwt.smoke,25)
tval <- t.test(dat.ns,dat.s)$statistic
tval

This gives the value 2.120904 which is apparently wrong. I also tried setting the seed to 1 before each sample as follows:

set.seed(1)
dat.ns <- sample(bwt.nonsmoke,25)
set.seed(1)
dat.s <- sample(bwt.smoke,25)
tval <- t.test(dat.ns,dat.s)$statistic
tval

which gives the t value of 1.573627 which is also wrong. I'm not sure what I'm doing wrong and I'd like some help.

Len Greski
  • 10,505
  • 2
  • 22
  • 33

1 Answers1

4

The random number generator in R changed significantly at R version 3.6.0, as highlighted in an R Bloggers article, What's new in R 3.6.0?

If you're using a pre 3.6.0 version of R, you'll get the following t-test statistic based on your code:

> RNGversion("3.5.3")
Warning message:
In RNGkind("Mersenne-Twister", "Inversion", "Rounding") :
  non-uniform 'Rounding' sampler used
> set.seed(1)
> dat.ns <- sample(bwt.nonsmoke,25)
> dat.s <- sample(bwt.smoke,25)
> t.test(dat.ns,dat.s)

    Welch Two Sample t-test

data:  dat.ns and dat.s
t = 2.1209, df = 47.693, p-value = 0.03916
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  0.5141953 19.3258047
sample estimates:
mean of x mean of y 
   124.68    114.76 

If you use R 3.6.0 or newer, you get the following answer with the same code.

> # redo with RNVversion(3.6.3)
> RNGversion("3.6.3")
> set.seed(1)
> dat.ns <- sample(bwt.nonsmoke,25)
> dat.s <- sample(bwt.smoke,25)
> t.test(dat.ns,dat.s)

    Welch Two Sample t-test

data:  dat.ns and dat.s
t = 1.6593, df = 47.58, p-value = 0.1036
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.79772 18.75772
sample estimates:
mean of x mean of y 
   125.12    116.64 

Bottom line: check the version of R that was used to create the quiz answers to confirm the version of the random number generator.

Len Greski
  • 10,505
  • 2
  • 22
  • 33
  • 1
    Thanks, using `RNGversion("3.6.3")` gave me the correct answer. Although I'm not sure why it was using the old RNG version even though the R version I have installed is version 3.6.3. Is there a way to change the default to the new version instead? – Advait Bhagwat Apr 20 '20 at 12:28
  • @AdvaitBhagwat - Yes, you can change your .Rprofile file to always use `RNGversion('3.6.3')`. Instructions explaining how to do this are in the *R Bloggers* article [Fun with .Rprofile and customizing R startup](https://www.r-bloggers.com/fun-with-rprofile-and-customizing-r-startup/). – Len Greski Apr 20 '20 at 21:26