-1

Below is a simplified data frame from some qPCR data:

   sample                  exprFile reaction_conc
1       A   140701_2014-07-03-15-49            59
2       A   140701_2014-07-03-15-49            70
3    NC_1   140701_2014-07-03-15-49             2
4    NC_1   140701_2014-07-03-15-49             3
5    NC_1   140701_2014-07-03-15-49             2
6       A 140701_2_2014-07-01-19-07           200
7       A 140701_2_2014-07-01-19-07           202
8       B 140701_2_2014-07-01-19-07           300
9       B 140701_2_2014-07-01-19-07           322
10      B 140701_2_2014-07-01-19-07           333
11   NC_1 140701_2_2014-07-01-19-07             8
12   NC_1 140701_2_2014-07-01-19-07             8
13   NC_2 140701_2_2014-07-01-19-07             4
14      D   140701_2014-07-02-20-53            44
15   NC_2   140701_2014-07-02-20-53             0
16   NC_2   140701_2014-07-02-20-53             2
17   NC_2   140701_2014-07-02-20-53             1
18      A   140708_2014-07-08-19-20           100
19      A   140708_2014-07-08-19-20           108
20      A   140708_2014-07-08-19-20           111
21      D   140708_2014-07-08-19-20            88
22      D   140708_2014-07-08-19-20            80
23      E   140708_2014-07-08-19-20           645
24   NC_3   140708_2014-07-08-19-20             8
25   NC_3   140708_2014-07-08-19-20            12
26   NC_1   140708_2014-07-08-19-20             4
27   NC_2   140708_2014-07-08-19-20             0

Each exprFile is an experiment and I'd like to subtract controls (samples labeled NC*) from each sample within an experiment by taking the average of the control values (reaction_conc). Some experiments contain several types of controls. I'd like to create new columns with the subtracted values for each control type. Lastly, I'd like to create a column that determines which control type is the highest and subtracts that from the values.

I've likely confused you with this description (sorry!), so here's the expected output:

   sample                  exprFile reaction_conc minusNC_1 minusNC_2 minusNC_3   minusNC_highest
1       A   140701_2014-07-03-15-49            59     56.67        NA        NA             56.67
2       A   140701_2014-07-03-15-49            70     67.67        NA        NA             67.67
3    NC_1   140701_2014-07-03-15-49             2     -0.33        NA        NA             -0.33
4    NC_1   140701_2014-07-03-15-49             3      0.67        NA        NA              0.67
5    NC_1   140701_2014-07-03-15-49             2     -0.33        NA        NA             -0.33
6       A 140701_2_2014-07-01-19-07           200    192.00       196        NA            192.00
7       A 140701_2_2014-07-01-19-07           202    194.00       198        NA            194.00       
8       B 140701_2_2014-07-01-19-07           300    292.00       296        NA            292.00
9       B 140701_2_2014-07-01-19-07           322    314.00       318        NA            314.00
10      B 140701_2_2014-07-01-19-07           333    325.00       329        NA            325.00
11   NC_1 140701_2_2014-07-01-19-07             8      0.00         4        NA              0.00
12   NC_1 140701_2_2014-07-01-19-07             8      0.00         4        NA              0.00
13   NC_2 140701_2_2014-07-01-19-07             4     -4.00         0        NA             -4.00
14      D   140701_2014-07-02-20-53            44        NA        43        NA             43.00          
15   NC_2   140701_2014-07-02-20-53             0        NA        -1        NA             -1.00
16   NC_2   140701_2014-07-02-20-53             2        NA         1        NA              1.00
17   NC_2   140701_2014-07-02-20-53             1        NA         0        NA              0.00
18      A   140708_2014-07-08-19-20           100     96.00       100        90             90.00
19      A   140708_2014-07-08-19-20           108    104.00       108        98             98.00
20      A   140708_2014-07-08-19-20           111    107.00       111       101            101.00
21      D   140708_2014-07-08-19-20            88     84.00        88        78             78.00
22      D   140708_2014-07-08-19-20            80     76.00        80        70             70.00
23      E   140708_2014-07-08-19-20           645    641.00       645       635            635.00
24   NC_3   140708_2014-07-08-19-20             8      4.00         8        -2             -2.00
25   NC_3   140708_2014-07-08-19-20            12      8.00        12         2              2.00
26   NC_1   140708_2014-07-08-19-20             4      0.00         4        -6             -6.00
27   NC_2   140708_2014-07-08-19-20             0     -4.00         0       -10            -10.00
B. Brooks
  • 1
  • 4

2 Answers2

0

I am not really sure what your description means. At first it seemed that you wanted to subtract the average accross all NC1 samples from other samples, but your expected output contradicts that. If it would be so, I would suggest this:

##calculate means for control samples:
mean1=mean(df$reaction_conc[grepl("NC1",df$sample)])
mean2=mean(df$reaction_conc[grepl("NC2",df$sample)])
mean3=mean(df$reaction_conc[grepl("NC3",df$sample)])
##subtracts them from other data:
df=mutate(df,
minusNC_1 = reaction_conc - mean1,
minusNC_2 = reaction_conc - mean2,
minusNC_3 = reaction_conc - mean3,
minusNC_highest = reaction_conc - max(c(mean1,mean2,mean3)))
Maksim Gayduk
  • 1,051
  • 6
  • 13
0

Sorry for the poorly worded question. After talking things over with my lab mates, I ended up with this:

#load dplyr    
library(dplyr)

#subset negative controls    
neg <- subset(data, sample == "NC_1" | sample == "NC_2" | sample == "NC_3")

#find mean of each control type in each experiment
neg_sub_mean <- neg %>% group_by(sample, exprFile) %>% summarise(mean = mean(DF_correction, na.rm=TRUE))

#find control with maximum value
neg_sub_mean_max <- neg_sub_mean %>% group_by(exprFile) %>% summarise(max_neg = max(mean))

#merge the original data dataframe and the newly created dataframe with the means of the maximum control per each experiment
df_merge <- full_join(data, neg_sub_mean_max, "exprFile")

#subtract the max negative control from the sample values
df_merge <- df_merge %>% mutate(minusNeg = reaction_conc - max_neg)
B. Brooks
  • 1
  • 4