0

I'm attempting to calculate two summary statistics (mean and standard error) from the following data set, where both Location and Adult should be factors.

 Location    Adult    OverComp
 F           1        7
 P           1        8
 P           0        10
 F           1        3
 F           0        11

I would like the output to appear as follows:

Location    Adult     OverComp.m    OverComp.se
F           1         (mean)        (standard error)
F           0         (mean)        (standard error)
P           1         (mean)        (standard error)
P           0         (mean)        (standard error)

Where OverComp.m is the calculated mean for each combination of Location x Adult, and OverComp.se is standard error for each of those combinations. I want this format because I want to then use this with ggplot2, to make a bar plot of the four means & se's, color-coded for Location.

I've gotten this far:

 summary.OverComp <-data.frame(
 + Location=levels(as.factor(data$FLocation)),
 + MeanOverComp=tapply(data$OverComp, list(data$FLocation,data$Adult), mean),
 + se=tapply(data$OverComp, list(data$FLocation,data$Adult),std.error))

Which produces the statistics I want, but not the format that I need for plotting in ggplot2 (as far as I can tell):

summary.OverComp
       Location   MeanOverComp.0 MeanOverComp.1   se.0      se.1
 F     Fiji       7.238095       8.454545         0.3792062 0.3023071
 P     Peru       6.893617       5.395833         0.4544304 0.3076155

I am now a bit clueless - not sure whether to pursue a different method for plotting, or a transformation to the above output, or to figure out how to incorporate Adult as a factor in my summary coding. I have an inkling that reshape2 may be involved, but not sure how to approach that. Your help would be much appreciated!

M.A.Kline
  • 1,697
  • 2
  • 19
  • 29
  • Make your life easier and use either `stat_summary` in ggplot (see examples in the help page) or use a package (Hadley's plyr or dplyr or Matt's data.table). – Roland Aug 16 '14 at 17:07

2 Answers2

1

You could try data.table (if dat is the dataset)

 library(plotrix)
 library(data.table)

 setDT(dat)[,list(OverComp.m=mean(OverComp),
                Overcomp.se=std.error(OverComp)), by=list(Location, Adult)]
     Location Adult OverComp.m Overcomp.se
 #1:        F     1          5           2
 #2:        P     1          8          NA
 #3:        P     0         10          NA
 #4:        F     0         11          NA
akrun
  • 874,273
  • 37
  • 540
  • 662
0

This is a typical use for aggregate, a base (actually stats-pkg) function:

> aggregate(dat$OverComp, # the values being aggregated
            dat[-3],   # the grouping factors
            function(Ov) c(mean=mean(Ov), sd=sd(Ov) ) #aggregation function(s)
            )  
  Location Adult    x.mean      x.sd
1        F     0 11.000000        NA
2        P     0 10.000000        NA
3        F     1  5.000000  2.828427
4        P     1  8.000000        NA

If you had more than one item in the three categories where you see NA's then a more ayttractive ouput would have occurred.

IRTFM
  • 258,963
  • 21
  • 364
  • 487