2

I am trying to run a t-test with bootstrap in R. I have a sample of 50 participants, 39 are females. I have a dependent variable, d' and want to see if males and females differ on this var. As I only have 11 male participants, I want to use a bootstrapped t-test (not the best idea but I've seen it in literature).

I have a database called "data" with several variables. So, first I extracted two vectors:

dPrimeFemales <- subset(data, Gender == "F", 
                  select=c(dPrime))

dPrimeMales <- subset(data, Gender == "M", 
                        select=c(dPrime))

Then, I tried several things found on the internet (and here). Based on this post I tried:

set.seed(1315)
    B      <- 1000
    t.vect <- vector(length=B)
    p.vect <- vector(length=B)
    for(i in 1:B){
      boot.c <- sample(dPrimeFemales, size=nrow(dPrimeFemales), replace=T)
      boot.p <- sample(dPrimeMales, size=nrow(dPrimeMales), replace=T)
      ttest  <- t.test(boot.c, boot.p)
      t.vect[i] <- ttest$statistic
      p.vect[i] <- ttest$p.value
    }

But it says:

Error: Must use a vector in `[`, not an object of class matrix.
Call `rlang::last_error()` to see a backtrace

I also tried this: boot.t.test: Bootstrap t-test

First, I couldn't load the functions. So, I copy-pasted and ran this:

Bootstrap Function

Then I ran this:

boot.t.test(x = dPrimeFemales, y = dPrimeMales)

But, it says this:

Error in boot.t.test(x = dPrimeFemales, y = dPrimeMales) : 
  dims [product 1] do not match the length of object [1000]
In addition: There were 50 or more warnings (use warnings() to see the first 50)

If I use warnings() it says:

1: In mean.default(x) : argument is not numeric or logical: returning NA
2: In mean.default(y) : argument is not numeric or logical: returning NA
3: In mean.default(c(x, y)) : argument is not numeric or logical: returning NA
4: In mean.default(x) : argument is not numeric or logical: returning NA
5: In mean.default(y) : argument is not numeric or logical: returning NA

Etc...

To be more clear, I am thinking of something like the bootstrapped t-test in SPSS, like this: enter image description here

I thought this was going to be much easier. Any help is welcome

Thank you all for your time.

structure(list(dPrime = c(0.60805224661517, 0.430727299295457, 
-0.177380196159658, 0.771422126383253, 0.598621304083563, 0, 
0.167894004788105, -0.336998837042929, 0.0842422708809764, -0.440748778800912, 
0.644261556974516, -0.167303467814258, 0.169695369228671, -0.251545738695235, 
0.0842422708809764, -0.0985252105020469, -0.239508275220057, 
-0.143350050535084, 0.430727299295457, 0.757969499665785, -0.282230896122292, 
-0.271053409572241, -0.090032472207662, -0.090032472207662, 0.524400512708041, 
-0.218695510362827, -0.271053409572241, 1.07035864674857, 0.262833294507352, 
0.421241107923905, -0.0836517339071291, 0.090032472207662, -0.598621304083563, 
-0.356506507919935, 0.474566187745845, 0.336998837042929, 1.35083901409173, 
-0.336998837042929, -0.443021053393661, 0.757969499665785, -0.841621233572914, 
0.167303467814258, 0.167894004788105, 0.090032472207662, -0.177380196159658, 
0.251545738695235, -0.344495842891614, -0.17280082229969, -0.440748778800912, 
0), Gender = c("F", "F", "F", "F", "F", "F", "F", "F", "M", "M", 
"F", "F", "F", "F", "F", "F", "F", "F", "M", "F", "M", "M", "F", 
"F", "F", "F", "F", "F", "F", "F", "M", "F", "F", "F", "M", "F", 
"F", "F", "F", "M", "M", "F", "F", "M", "M", "F", "F", "F", "F", 
"F")), row.names = c(NA, -50L), class = c("tbl_df", "tbl", "data.frame"
))
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Glu
  • 327
  • 3
  • 14
  • I'm unclear what exactly you intend to bootstrap. The t statistics? – Roland Sep 09 '19 at 14:48
  • Can you share the structure of `data`? Where exactly is the error? – Roman Luštrik Sep 09 '19 at 14:49
  • Yes, the t-statistic. I'll try and upload the data (trying)... – Glu Sep 09 '19 at 14:54
  • You should be able to get a sample of the data here: https://www.filehosting.org/file/details/820987/data.Rda (don't know whether there is a better way of sharing it here). Also, I tried to make more clear what I mean by adding a picture of the analysis I am trying to achieve as ran in SPSS – Glu Sep 09 '19 at 15:01

1 Answers1

1

Here's an example of using that function with simulated data where you'd expect a p-value close to 1. No need to subset it beforehand and create intermediate objects.

set.seed(0)
df <- data.frame(gender = sample(c('M', 'F'), size=50, replace=T),
                 measure = runif(n=50))

boot.t.test(df[df$gender=='M', 'measure'], df[df$gender=='F', 'measure'], reps=1000)

Bootstrap Two Sample t-test


t = -0.186, p-value = 0.859
Alternative hypothesis: true difference in means is not equal to 0

$mu0 
[1] 0

$statistic
[1] -0.1863362

$alternative
[1] "two.sided"

$p.value
[1] 0.859
Bill O'Brien
  • 862
  • 5
  • 14
  • Hi Bill, this looks great but could you tell me which packages do you need? Because I was trying something similar (I think) but I get: Error in boot.t.test(df[data$Gender == "M", "measure"], df[data$Gender == : could not find function "boot.t.test There is clearly some package or library that I am missing – Glu Sep 09 '19 at 15:19
  • 1
    I couldn't get the tpepler/nonpar package to install either (https://rdrr.io/github/tpepler/nonpar/src/R/boot.t.test.R), so I just copied and pasted the function into my workspace. Not ideal, but enough to test it out. – Bill O'Brien Sep 09 '19 at 15:23
  • Sorry, I was just trying what you suggested. So, this is what I tried: set.seed(0) boot.t.test(data[data$Gender=='M', 'dPrime'], data[data$Gender=='F', 'dPrime'], reps=1000) I get this error: Error in boot.t.test(data[data$Gender == "M", "dPrime"], data[data$Gender == : dims [product 1] do not match the length of object [1000] In addition: There were 50 or more warnings (use warnings() to see the first 50). Which is similar to what I was getting before (see my initial post) – Glu Sep 09 '19 at 15:31
  • BTW, if I try your example it works. I really don't understand why it doesn't work with my data. – Glu Sep 09 '19 at 15:50
  • Can't answer because filehosting.org is blocked at my workplace. If you use dput() to post your data in the question above, I'm happy to help further. When you run data[data$Gender=='M', 'dPrime'], does it return a numeric vector of the expected length? – Bill O'Brien Sep 09 '19 at 15:51
  • Ok, I just tried. I created a new dataset with only dPrime and Gender so not to share any private info...if I run what you suggested I get a tubbke 11x1, which is what I expect. dPrime for 11 males. Btw, thank you again, I really appreciate. – Glu Sep 09 '19 at 15:57
  • I think I found the problem. I can't believe it: it looks like my "data" was not really a data.frame. If I type str(data) it returns: Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 50 obs. of 27 variables. If I convert to a data.frame, that is: dataAsDb = as.data.frame(data), then: boot.t.test(dataAsDb[dataAsDb$Gender=='M', 'dPrime'], dataAsDb[dataAsDb$Gender=='F', 'dPrime'], reps=1000) it works... – Glu Sep 09 '19 at 16:15