-1

I want to perform multiple ttests on data in the following format

first column is "id" with values (for example) 1,1,1,2,2,2

second column is "ratios" with values 0.2, 0.18, 0.3, 1.5, 1.4, 1.6

for each instance of "id" I want to test all ratio values against all the ratio values in the dataframe

Right now I have this

data <- read.delim("clipboard", stringsAsFactors=FALSE) ##data to test
dist <- as.numeric(readClipboard()) ##distribution to test against


data$Ratio.Mean.H.L <- NA
data$p.value <- NA


for (i in 1:nrow(data))
     if (nrow(data) > 1)
 {
 #welsh t-test
 t.test.result <- t.test(data$ratio[i],dist,
                         alternative = "two.sided",
                         mu = 0, 
                         paired = FALSE, 
                         var.equal = FALSE,
                         conf.level = 0.95)     
 #writes data into the data.frame
 data$p.value[i] <- t.test.result$p.value
 }

write.table(data, file="C:/R_Temp/t-test.txt", sep = "\t")

I know this does not work, for one I am not sure I am only testing rows that share the same "id". I am also manually entering the distribution to test against, which is all entries in the "ratio" column.

How do I do this correct? and add multiple testing correction (bonferroni)?

brtw
  • 1
  • 1
    Instead of describing what the data might look like and then using `read.delim("clipboard")`, can you just give us a sample data.frame? This is not something we can reproduce in order to help you. – r2evans Dec 13 '17 at 19:28
  • 2
    Hmm... if I understand correctly, you want to compare a single number (`data$ratio[i]`) against a vector of values (`dist`). But t-tests are for comparing two sets of numbers - comparing a single number to a set isn't going to give you a meaningful result. What question are you trying to answer here? – Matt Parker Dec 13 '17 at 19:29
  • `by(mtcars$mpg, mtcars$cyl, function(x) t.test(x, dist, paired=F, var.equal=F)$p.value)`? – r2evans Dec 13 '17 at 19:31

2 Answers2

0

I suspect that MattParker's comment is going to be the biggest thing here: you are comparing a single number with a vector, and t.test will complain about that. Since you suggested that you want to perform tests per grouping variable (id), so in base R you probably want to use a function like by (or split). (There are great methods within dplyr and data.table as well.)

Using mtcars as sample data, I'll try to mimic your data:

dat <- mtcars[c("cyl", "mpg")]
colnames(dat) <- c("id", "ratio")

It isn't clear what you mean to use for dist, so I'll use the naïve

dist <- 1:10

Now you can do:

by(dat$ratio, dat$id, function(x) t.test(x, dist, paired = FALSE)$p.value)
# dat$id: 4
# [1] 2.660716e-10
# ------------------------------------------------------------ 
# dat$id: 6
# [1] 4.826322e-09
# ------------------------------------------------------------ 
# dat$id: 8
# [1] 2.367184e-07

If you want/need to deal with more than just ratio at a time, you can alternatively do this:

by(dat, dat$id, function(x) t.test(x$ratio, dist, paired = FALSE)$p.value)
# dat$id: 4
# [1] 2.660716e-10
# ------------------------------------------------------------ 
# dat$id: 6
# [1] 4.826322e-09
# ------------------------------------------------------------ 
# dat$id: 8
# [1] 2.367184e-07

The results from the call to by are a class "by", which is really just a repackaged list with some extra attributes:

res <- by(dat, dat$id, function(x) t.test(x$ratio, dist, paired = FALSE)$p.value)
class(res)
# [1] "by"
str(attributes(res))
# List of 4
#  $ dim     : int 3
#  $ dimnames:List of 1
#   ..$ dat$id: chr [1:3] "4" "6" "8"
#  $ call    : language by.data.frame(data = dat, INDICES = dat$id, FUN = function(x) t.test(x$ratio,      dist, paired = FALSE)$p.value)
#  $ class   : chr "by"

So you can expand/access it however you would a list:

res[[1]]
# [1] 2.660716e-10
as.numeric(res)
# [1] 2.660716e-10 4.826322e-09 2.367184e-07
names(res)
# [1] "4" "6" "8"

(Realize that the different levels of dat$id are the integers 4, 6, and 8, so the names should correspond to your $id.)

Edit:

If you want the results in a data.frame, two options come to mind:

  1. Repeat the p-value for each and every row, resulting in a lot of duplication. I discourage this method for several reasons; if you need it at some point, I suggest using option 2 and then merge.
  2. Produce a data.frame with as many rows as unique id. Something like:

    do.call(rbind.data.frame,
            by(dat, dat$id, function(x) list(id=x$id[1], pv=t.test(x, dist, paired=F)$p.value)))
    #   id           pv
    # 4  4 1.319941e-03
    # 6  6 2.877065e-03
    # 8  8 6.670216e-05
    
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • OK, I think there has been some misunderstanding. Let's use the mtcars dataframe as an example. If we use "cyl"="id" and "mpg"="ratio". I want to do a t-test on all of the ratios for each id, in other owrds compare all of the mpgs for a 4 cyl car. In mt cars there are 11 4 cyl cars. so the t-test would compare those 11 mpg measurements to my provided distribution "dist". – brtw Dec 13 '17 at 22:30
  • Was not finished, but hit return -it's my first time posting here. In my data I have thousands of t-tests to perform, and some of the "id"s only occur once, so they should be excluded from the t-test. Ideally I would write these results to a text file. – brtw Dec 13 '17 at 22:33
  • *"excluded from the t-test"* suggests simple filtering, such as `if (length(x)>1) ...`. *"to a text file"*: `writeLines` or `write.csv`? – r2evans Dec 14 '17 at 05:52
  • I'm getting there, slowly. – brtw Dec 14 '17 at 09:29
  • This works 'dat <- read.delim("clipboard", stringsAsFactors=FALSE) ##data to test dist <- as.numeric(readClipboard()) ##distribution to test against results <- by(dat$ratio, dat$id, function(x) if(length(x)>1) t.test(x, dist, paired = FALSE)$p.value)' but I don't know how to convert the results into data frame so I can write them to a .txt file. Even better would be to return the p.values to the original dataframe (dat) – brtw Dec 14 '17 at 09:31
0

OK, Sorry for the poorly defined question. I got help elsewhere and will post the script that worked for those that are interested. I want to calculate p-values for ratio changes in a proteomics experiment. To do this I make individual t-tests for all the ratio measurements for any given protein or PTM site.These measurements are compared to the median of all measurments (mu in the t.test function), or to the entire distribution of measurements. In one column I have "id"s which are unique for each entry, in the other column I have "values" (ratios). I will make t-tests comparing all "values" that occur with any given unique "id". for ease of use I paste the table into the script, rather than calling it from a file (it saves me a step).

data <- read.delim("clipboard", stringsAsFactors=FALSE) ##data to test(two columns "id" and "value") Log-transfrom ratios!!
summary(data)

med <- median(data$value)


# function for the id-grouped t-test 
calc_id_ttest <- function(d) #col1: id, col2:values
  {  
colnames(d) <- c("id", "value")  # reassign the column names

# calculate the number of values for each id
res_N <- as.data.frame(tapply(d$value, d$id, length)) 
colnames(res_N) <- "N"              
res_N$id <- row.names(res_N)

# calculate the number of values for each id
res_med <- as.data.frame(tapply(d$value, d$id, median)) 
colnames(res_med) <- "med"              
res_med$id <- row.names(res_med)

# calculate the pvalues 
res_pval <- as.data.frame(tapply(d$value, d$id, function(x)
{
  if(length(x) < 3) 
    {   # t test requires at least 3 samples
    NA
    } 
  else
    {
    t.test(x, mu=med)$p.value #t.test (Pearson)d$value with other distribution? alternative=less or greater
    }                         #d$value to compare with entire distribution
                              #mu=med for median of values for 1-sided test
}))

colnames(res_pval) <- "pval"  # nominal p value 
res_pval$id <- row.names(res_pval)
res_pval$adj.pval <- p.adjust(res_pval$pval, method = "BH")  #multiple testing correction also "bonferroni"

res <- Reduce(function(x,y)
{
merge(x,y, by = "id", all = TRUE)
}, 
list(res_N, res_med, res_pval))
return (res)
}

data_result <- calc_id_ttest(d = data)
write.table(data_result, file="C:/R_Temp/t-test.txt", quote = FALSE, row.names = FALSE, col.names = TRUE, sep = "\t")
brtw
  • 1