3

I was hoping someone could help me with the following problem:

I am attempting to make a combined barplot showing the mean and standard errors for 3 different continuous variables (body temp, length, mass) recorded for a binary variable (gender).

I have been able to plot the mean values for each variable but I can't seem to successfully calculate the standard error for these 3 variables using any of the codes I've tried. I tried many things, but I think I was on the right track with this:

    View(test4)
    test4 <- aggregate(test4, 
             by = list(Sex = test4$Sex), 
             FUN = function(x) c(mean = mean(x), sd = sd(x),
                                 n = length(x)))
    test4
    #this produced mean, sd, length for ALL variables (including sex)
    test4<-do.call(test4)
    test4$se<-test4$x.sd / sqrt(test4$x.n)

Then I kept getting the error:

    Error in sqrt(test4$x.n) : non-numeric argument to mathematical function

I tried to recode to target my 3 variables after aggregate(test4...) but I couldn't get it to work...Then I subsetted by resulting dataframe to exclude sex but that didn't work. I then tried to define it as a matrix or vector but still that didn't work.

I would like my final graph to to have y axis = mean values, x axis = variable (3 sub-groups (Tb, Mass, Length) with two bars side by side showing male and female values for comparison.

Any help or direction anyone could provide would be greatly appreciated!!

Many thanks in advance! :)

aosmith
  • 34,856
  • 9
  • 84
  • 118
brittany
  • 41
  • 1
  • 3
  • This currently reads like a question about `aggregate`, not a question about plotting. For plotting you could try to play around with something along the lines of [this answer](http://stackoverflow.com/a/19299034/2461552). – aosmith May 10 '16 at 15:45

2 Answers2

3

aggregate does give some crazy output when you are trying to output more than one column. If you wish to use aggregate I would do mean and SE as separate calls to aggregate.

However, here is a solution using tidyr and dplyr that I don't think is too bad.

I've created some data. I hope it looks like yours. It is so useful to include a simulated dataset with your question.

library(tidyr)
library(dplyr)
library(ggplot2)

# Create some data 
test4 <- data.frame(Sex = rep(c('M', 'F'), 50),
                    bodytemp = rnorm(100),
                    length = rnorm(100), 
                    mass = rnorm(100))

# Gather the data to 'long' format so the bodytemp, length and mass are all in one column
longdata <- gather(test4, variable, value, -Sex)
head(longdata)

# Create the summary statistics seperately for sex and variable (i.e. bodytemp, length and mass)
summary <- longdata %>%
             group_by(Sex, variable) %>%
             summarise(mean = mean(value), se = sd(value) / length(value))

# Plot
ggplot(summary, aes(x = variable, y = mean, fill = Sex)) + 
  geom_bar(stat = 'identity', position = 'dodge') +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se),                            
                  width = 0.2,
                  position = position_dodge(0.9))

outputbarchart

timcdlucas
  • 1,334
  • 8
  • 20
  • Thanks for your help! Unfortunately when I followed this script it didn't produce the graph I wanted it to (probably because we were working with different datasets), but it did get me started with organizing my data the long way and then I was able to join it with another script I used when I just had one output. I will definitely include a dataset next time! Thanks again for your help :) – brittany May 11 '16 at 04:14
0

My final plot

Update: I was able to answer my question by combining the initial part of timcdlucas script along with another one I had used when plotting just one output. For anyone else who may be seeking an answer to a similar question, I have posted my script and the resulting graph (see link above):

View(test3) #this dataframe was organized as 'sex', 'tb', 'mass', 'svl' 
newtest<-test3
View(newtest)

#transform data to 'long' combining all variables in one column 
longdata<-gather(newtest, variable, value, -Sex)
View(longdata)

#set up table in correct format
longdata2 <- aggregate(longdata$value, 
                 by = list(Sex = longdata$Sex, Variable = longdata$variable),
                 FUN = function(x) c(mean = mean(x), sd = sd(x),
                                     n = length(x)))
longdata2 <- do.call(data.frame, longdata2)
longdata2$se<-longdata2$x.sd / sqrt(longdata2$x.n)
colnames(longdata2)<-c("Sex", "Variable", "mean", "sd", "n", "se")
longdata2$names<-c(paste(longdata2$Variable, "Variable /", longdata2$Sex,    "Sex"))
View(longdata2)
dodge <- position_dodge(width = 0.9)
limits <- aes(ymax = longdata3$mean + longdata3$se,
          ymin = longdata3$mean - longdata3$se)

#To order the bars in the way I desire *might not be necessary for future scripts*
positions<-c("Tb", "SVL", "Mass")

#To plot new table: 

bfinal <- ggplot(data = longdata3, aes(x = factor(Variable), y = mean,
                             fill = factor(Sex)))+
geom_bar(stat = "identity",
         position = position_dodge(0.9))+
geom_errorbar(limits, position = position_dodge(0.9),
            width = (0.25)) +
labs(x = "Variable", y = "Mean") +
ggtitle("")+
scale_fill_discrete(name = "", 
                  labels=c("Male", "Female"))+
scale_x_discrete(breaks=c("Mass", "SVL", "Tb"),
               labels=c("Mass", "SVL", "Tb"), 
               limits=(positions))
bfinal  

:)

brittany
  • 41
  • 1
  • 3