averaging a column in a dataframe based on another column -- all into one array

Question

x

 primer  timepoints  foldInduction
  Acan         0      1.0000000
  Acan        20      0.6779533
  Acan        20      1.9734277
   Fos        40     21.3731640
   Fos        60      1.8517668
   Fos        40    118.2970756
  Acan         0      1.0000000
   Fos        60     17.5241529

I want to 2 things 1. mean and 2. stderr of foldInduction for every primer at every time point so what I would like is a final array where the names(array) is the timepoints and the array itself are the means. Also trying to incorporate this w/ stderrs.

so for primer 'Acan' means something like this

 0     20 
1.0   1.325

i figured tapply might work for this. So this is what I've been doing

       stderr <- function(x){sd(x,na.rm=TRUE)/sqrt(length(x))}
       means <- tapply(x$foldInductions,factor(as.numeric(x$timepoints)),mean,na.rm=T)
       stderrs <- tapply(x$foldInductions,factor(as.numeric(x$timepoints)),stderr)

Also, there may not be the same amount of foldInductions to average for a given timepoint but I don't think this should be a problem.

if you could help me in creating this array for one primer that would be great.

score 0 · Accepted Answer · edited May 23 '17 at 10:24

I foresee an onslaught of answers forthcoming. There have been 100s of different questions along these same lines, often comparing the relative timing and merits of many solutions. Here is one such question and answer. I recommend you find one framework that works for you and stick with it.

Here's a solution using plyr and summarize.

First, recreate your data:

x <- read.table(text = "primer      exptname concentrate timepoints replicate    day   realConc foldInduction
  Acan           0hr        55mM          0        b1 011311 0.05875824     1.0000000
  Acan KClpulse-5min        55mM         20        b1 011311 0.03983534     0.6779533
  Acan KClpulse-5min        55mM         20        b1 011311 0.11595514     1.9734277
   Fos KClpulse-5min        55mM         40        b1 011311 0.11964684    21.3731640
   Fos KClpulse-5min        55mM         60        b1 011311 0.01036618     1.8517668
   Fos KClpulse-5min        55mM         40        b1 011311 0.66222632   118.2970756
  Acan           0hr        55mM          0        b2 011411 0.05681637     1.0000000
   Fos KClpulse-5min        55mM         60        b2 011411 0.23492697    17.5241529", header = TRUE)

Then do some group by magic with ddply:

require(plyr)
ddply(x, .(primer, timepoints), summarize, 
      mean = mean(foldInduction, na.rm = TRUE), 
      sde = sqrt(var(foldInduction, na.rm = TRUE)/length(foldInduction))
      )

 primer timepoints     mean        sde
1   Acan          0  1.00000  0.0000000
2   Acan         20  1.32569  0.6477372
3    Fos         40 69.83512 48.4619558
4    Fos         60  9.68796  7.8361931

I didn't 100% follow your last bit about the named vector bit, but hopefully this shows you how to compute the answers you need and you can munge the data into the appropriate format from there.

averaging a column in a dataframe based on another column -- all into one array

1 Answers1