-1

I want to compare two datasets at differet bins. My input data is something like this:

 dataIn <- read.table(text =
"bin_slots  val_cases   val_controls
A   0.075   0.05
A   0.252   0.276
A   0.338   0.41
A   0.911   0.983
A   0.912   0.809
A   0.965   0.917
A   1   1
A   1   1
A   0   0
A   1   0.983
A   0.398   0.681
A   0.606   0.431
B   0.58    0.608
B   0.729   0.773
B   0.871   0.879
B   1   1
B   0.297   0.282
B   0.673   0.737
B   0.807   0.803
B   0.838   0.824
B   0.633   0.658"
, header = TRUE)

Using the above dataset i want to compare val_cases and val_controls for A and B and so on...Thus the output I would like to get can be like this:

bin_slots   p_value
A   0.416336774
B   0.066616655

Thanks a lot. Best wishes, Meraj

rici
  • 234,347
  • 28
  • 237
  • 341
Meraj
  • 1
  • 1
  • Are these actually paired data (i.e., each row is a matched pair of observations), or are the values for cases and controls not related? In either case, you probably want to use a two-way ANOVA, rather than making a lot of pairwise comparisons – Mark Peterson Oct 10 '16 at 16:12
  • Hi Mark, Yes it is paired data (pairing between cases and controls for each row). – Meraj Oct 10 '16 at 16:25

1 Answers1

0

If the data are paired, you can either just analyze the difference (the same meaning as a paired t-test), as I do here, or add a column for "Individual" and run an ANOVA like I have below (treating individual as a random variable, for interpretation purposes). Here, I add a column for the difference (using dplyr), then run look at the output of the fitted model. For more control, save the result of lm and look at it with aov or anova and their methods.

ifPaired <-
  dataIn %>%
  mutate(diff = val_cases - val_controls)


lm(diff ~ bin_slots - 1
   , data = ifPaired) %>%
  summary()

Outputs (in part):

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
bin_slotsA -0.006917   0.024850  -0.278    0.784
bin_slotsB -0.015111   0.028695  -0.527    0.605

If, instead, the data are not paired, but rather just independent observations, convert the data to a long format (using tidyr here), then run an ANOVA with bin and group as predictors.

ifNotPaired <-
  dataIn %>%
  gather("group", "value", -bin_slots) %>%
  mutate(group = gsub("val_", "", group))

lm(value ~ group + bin_slots
   , data = ifNotPaired) %>%
  summary()

Outputs (in part):

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.61966    0.08110   7.641 2.87e-09 ***
groupcontrols  0.01043    0.09781   0.107    0.916    
bin_slotsB     0.09690    0.09882   0.981    0.333 
Mark Peterson
  • 9,370
  • 2
  • 25
  • 48
  • Thanks Mark. I am interested to do paired-end ttest for my data. To clarify, each row of the input corresponds to a frequency bin and not each individual. In the first bin (designated as "A") there are 12 data points for cases and control each. I have hundreds of such bins (data points may vary for each bin) and I want to get p value for them. I am newbie to scripting. Thanks again! – Meraj Oct 11 '16 at 13:08