Get p-value with two variables and multiple row names

Question

I wandered if you can help me in measuring the p-value from this simple data.frame. My data frame is called (my_data). By viewing it, you can see similar values I have that I am comparing:

my_data <- read.csv("densityleftOK.csv", stringsAsFactors = FALSE [c(1,2,3),]

      P1    P2   P3  P4  P5   T1  T2  T3  T4  T5  T6
A     1008 1425 869 1205 954  797 722 471 435 628 925
B      550  443 317  477 337  383  54 111  27 239 379
C      483  574 597  375 593  553 249 325 238 354 411

Thus, I would like to get a single pvalue for each row by comparing placebo vs treated samples. If you don't mind, I'd like to get also the standard deviation between either placebo (P) and treated (T).

I appreciate any help. Thanks

Sorry, but you give us only the means of the five groups A:E. For p-values and standard deviation the complete data are required. If you show us the data structure someone can give you the R code to compute it yourself. — Jan, May 09 '20 at 09:40
Thanks @Jan for your reply. I edited the data by adding the complete one. I think that now it is ok. Thanks :) — Dave91, May 09 '20 at 10:36
What kind of test would you want to do to get a p-value? p is nothing more than the probability that your data looks exactly like it does given a certain assumption. From your data I assume you want to ascertain that the treatment and the placebo values coem from different populations. In plain English, you want to show that they are different. If that is the case I need to know if the treatment and placebo sample are repeated measures or not. I assume they are independent samples, right? Finally, it would be useful to know if treatment is supposed to increase or lower the measurements. — Jan, May 09 '20 at 11:17
Yes, I do want to show that they are different. The columns are individual tissue-sample, the row are different tissue regions. The treatment is supposed to lower the measurements. — Dave91, May 09 '20 at 11:22

score 1 · Accepted Answer · answered May 09 '20 at 11:44

You can try something like below, where you pivot the data into long format,group by the ids, introduce a grouping vector("P" or "T") and use tidy on t.test to wrap it up in a table format:

library(broom)
library(tidyr)
library(dplyr)
library(tibble)

data = read.table(text="P1    P2   P3  P4  P5   T1  T2  T3  T4  T5  T6
A     1008 1425 869 1205 954  797 722 471 435 628 925
B      550  443 317  477 337  383  54 111  27 239 379
C      483  574 597  375 593  553 249 325 238 354 411",header=TRUE,row.names=1)

res = data %>% 
rownames_to_column("id") %>% 
pivot_longer(-id) %>% 
mutate(grp=sub("[0-9]","",name)) %>% 
group_by(id) %>% 
do(tidy(t.test(value ~ grp,data=.))) %>%
select(c(id,estimate,estimate1,estimate2,statistic,p.value)) %>%
mutate(stderr = estimate/statistic)

# A tibble: 3 x 7
# Groups:   id [3]
  id    estimate estimate1 estimate2 statistic p.value stderr
  <chr>    <dbl>     <dbl>     <dbl>     <dbl>   <dbl>  <dbl>
1 A         429.     1092.      663       3.40 0.00950  126. 
2 B         226.      425.      199.      2.89 0.0192    78.2
3 C         169.      524.      355       2.65 0.0266    64.0

If you don't use packages.. then it's a matter of using apply, and I guess easier to declare the groups up front:

grp = gsub("[0-9]","",colnames(data))

res = apply(data,1,function(i){
data.frame(t.test(i~grp)[c("statistic","p.value","stderr")])
})

res = do.call(rbind,res)
  statistic     p.value    stderr
A  3.395303 0.009498631 126.40994
B  2.890838 0.019173060  78.16650
C  2.646953 0.026608838  63.99812

That'll work. A few comments. First, take a look at your data. With those few samples you have, it is hard to ascertain that your data is normal distributed. A t-test only gives you descent results if we can assume that. I took a look and your data is very scattered. You may want to look for a non-parametric test or a sampling approach. — Jan, May 09 '20 at 12:37
Also consider the tissue regions. Do you need *all* of them to show improvement or will you consider your treament effective when only one of the regions does? In the former case you're good. In the latter, you have to correct for doing multiple tests. Easiest way to do it is a Bonferroni correction. That basically means that you multiply your p-values by the number of tissue regions. In that case only the first region comes out < 0.05. — Jan, May 09 '20 at 12:39
Finally, we can assume a one-sided test because only an improvement through treatment counts. By default, the t-test in R assumes two-sided comparisons. Add that to the command and the results will be more in your favour. — Jan, May 09 '20 at 12:41
@Jan. seems like great statistical advice for OP, you can certainly start a chatroom, leave it as more comments under his question, or post an answer by itself. Questions or discussions of this nature are more suited for https://stats.stackexchange.com/questions — StupidWolf, May 09 '20 at 12:49

Get p-value with two variables and multiple row names

1 Answers1