1

I have a problem using melt and mean on a data frame. Maybe it is just a beginner problem.

I have a data frame with two conditions and 3 replicates

set.seed(1)
testdf <- data.frame(name1=letters[1:5],con1_1=floor(runif(5,20,35)),con1_2=floor(runif(5,20,35)),con1_3=floor(runif(5,20,35)),con2_1=floor(runif(5,20,35)),con2_2=floor(runif(5,20,35)),con2_3=floor(runif(5,20,35)))

print(testdf)
  name1 con1_1 con1_2 con1_3 con2_1 con2_2 con2_3
1     a     23     33     23     27     34     25
2     b     25     34     22     30     23     20
3     c     28     29     30     34     29     25
4     d     33     29     25     25     21     33
5     e     23     20     31     31     24     25

Now I melt the data frame to get something like this and I wand to calculate the mean per condition...

melt(testdf[1:7],id.vars='name1')
   name1 variable value   mean_con1  mean_con2
1      a   con1_1    23    26.33333   28.66667
2      b   con1_1    25    27.00000   24.33333
3      c   con1_1    28    29.00000   29.33333
4      d   con1_1    33    29.00000   26.33333
5      e   con1_1    23    24.66667   26.66667
6      a   con1_2    33    26.33333   28.66667
7      b   con1_2    34    27.00000   24.33333
8      c   con1_2    29    29.00000   29.33333
9      d   con1_2    29    29.00000   26.33333
10     e   con1_2    20    24.66667   26.66667
11     a   con1_3    23    26.33333   28.66667
...

Is there a way to calculate the mean with the melted data frame, or do I have to recreate the original data frame and calculate the mean like below?

testdf$mean_con1 <- apply(testdf[2:4],1,mean)
testdf$mean_con2 <- apply(testdf[5:7],1,mean)

EDIT:

I just realised, I did a mistake in my question. Your answers are right, but what I am actually looking for would be the means in just one column. Alternating depending on the variable... Does this make sense?

melt(testdf[1:7],id.vars='name1')
   name1 variable value       mean
1      a   con1_1    23   26.33333
2      b   con1_1    25   27.00000
3      c   con1_1    28   29.00000
4      d   con1_1    33   29.00000
5      e   con1_1    23   24.66667
6      a   con1_2    33   26.33333
7      b   con1_2    34   27.00000
...
16     a   con2_1    27   28.66667
17     b   con2_1    30   24.33333
18     c   con2_1    34   29.33333
19     d   con2_1    25   26.33333
20     e   con2_1    31   26.66667
21     a   con2_2    34   28.66667
22     b   con2_2    23   24.33333
...

Ok, maybe I explain what I am trying to do. I want to plot the data with ggplot, I want to make bars representing the mean and the values as geom_points as an overlay. I want to color the dots by replicates and group everything by the condition... Hope someone has an idea...

smci
  • 32,567
  • 20
  • 113
  • 146
drmariod
  • 11,106
  • 16
  • 64
  • 110
  • Please check my updated solution – akrun Aug 13 '14 at 12:14
  • 1
    Adding your plotting problem changes the question significantly and is not really a good way to approach asking questions on SO. It's best to isolate these different problems as different questions. – A5C1D2H2I1M1N2O1R2T1 Aug 13 '14 at 12:37
  • The title *'Calculate means of variables in melted dataframe, grouped by name1'* tells us much more than *'problems using melt and mean'*. – smci Nov 06 '16 at 01:40
  • 1
    @smci Thanks for pointing this out. Sometimes it is hard to find a good title for the problem :-) – drmariod Nov 07 '16 at 10:00

5 Answers5

3

Here's an approach using "dplyr" and "tidyr":

library(dplyr)
library(tidyr)
testdf %>%
  gather(var, val, con1_1:con2_3) %>%
  separate(var, c("var", "time")) %>%
  group_by(name1, var) %>%
  summarise(mVal = mean(val)) %>%
  spread(var, mVal)
# Source: local data frame [5 x 3]
# 
#   name1     con1     con2
# 1     a 26.33333 28.66667
# 2     b 27.00000 24.33333
# 3     c 29.00000 29.33333
# 4     d 29.00000 26.33333
# 5     e 24.66667 26.66667

If you want the data to stay in a long form, try:

testdf %>%
  gather(var, val, con1_1:con2_3) %>%
  separate(var, c("var", "time")) %>%
  group_by(name1, var) %>%
  mutate(mVal = mean(val))

The equivalent with "reshape2" would be to use colsplit to split your "variable" column, and use dcast to get your aggregated results.

library(reshape2)
dfL <- melt(testdf, id.vars = "name1")
dfL <- cbind(dfL, colsplit(dfL$variable, "_", c("var", "time")))
dcast(dfL, name1 ~ var, value.var = "value", mean)
#   name1     con1     con2
# 1     a 26.33333 28.66667
# 2     b 27.00000 24.33333
# 3     c 29.00000 29.33333
# 4     d 29.00000 26.33333
# 5     e 24.66667 26.66667

The same, but in long form would be:

library(reshape2)
dfL <- melt(testdf, id.vars = "name1")
dfL <- cbind(dfL, colsplit(dfL$variable, "_", c("var", "time")))
dfL$mVal <- with(dfL, ave(value, name1, var))
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • I like the reshape2 version! Together with cbind and the melted data frame it gives the result I was looking for, thanks – drmariod Aug 13 '14 at 09:05
  • I just realised, that I did something wrong in my original question. I changed it, maybe you have an idea, how to solve the problem – drmariod Aug 13 '14 at 11:43
2

using base functions

 df1 <- reshape(testdf, idvar="name1", varying=2:7, direction="long",sep="_")
 aggregate(df1[,3:4], list(name1=df1[,1]), FUN=mean)
 #  name1     con1     con2
 #1     a 26.33333 28.66667
 #2     b 27.00000 24.33333
 #3     c 29.00000 29.33333
 #4     d 29.00000 26.33333
 #5     e 24.66667 26.66667

Or you can use data.table to calculate mean after it was reshaped

 library(data.table)
 setDT(df1)[, lapply(.SD, mean),.SDcols=c("con1", "con2"), by=name1]
 #   name1     con1     con2
 #1:     a 26.33333 28.66667
 #2:     b 27.00000 24.33333
 #3:     c 29.00000 29.33333
 #4:     d 29.00000 26.33333
 #5:     e 24.66667 26.66667

Update

If you need it in the long format

 library(reshape2)
 res <-  within( melt(testdf, id="name1"), {
         variable<- gsub("\\_.*","",variable)
          Mean<- ave(value, name1,variable)})

head(res)
#  name1 variable value     Mean
#1     a     con1    23 26.33333
#2     b     con1    25 27.00000
#3     c     con1    28 29.00000
#4     d     con1    33 29.00000
#5     e     con1    23 24.66667
#6     a     con1    33 26.33333

library(ggplot2)
res$repl <- rep(rep(1:3, each=5),2)
p <- ggplot(res, aes(name1, Mean))
p +
 geom_bar(position="dodge", stat="identity", fill="LightBlue") +
 geom_point(data=res, aes(name1, value, colour=repl))+
 facet_wrap(~variable) +
 theme_bw()

enter image description here

akrun
  • 874,273
  • 37
  • 540
  • 662
0
set.seed(1)
testdf <- data.frame(name1=letters[1:5],con1_1=floor(runif(5,20,35)),con1_2=floor(runif(5,20,35)),con1_3=floor(runif(5,20,35)),con2_1=floor(runif(5,20,35)),con2_2=floor(runif(5,20,35)),con2_3=floor(runif(5,20,35)))
require(reshape2)
melted  <- melt(testdf[1:7],id.vars='name1')
mean(melted$value)

Output:

> mean(melted$value)
[1] 27.16667
tagoma
  • 3,896
  • 4
  • 38
  • 57
0

You can use the ddply method from the plyr package on your melted frame:

require(ddply)
m = melt(testdf[1:7],id.vars='name1')
reshape(ddply(m, .(name1, con=substr(variable, 1,4)), summarize, m = mean(value)), timevar="con", idvar="name1", direction="wide") 

That gives

  name1   m.con1   m.con2
1     a 26.33333 28.66667
3     b 27.00000 24.33333
5     c 29.00000 29.33333
7     d 29.00000 26.33333
9     e 24.66667 26.66667
martin
  • 3,149
  • 1
  • 24
  • 35
0

You can use 'aggregate´, then.

set.seed(1)
testdf <- data.frame(name1=letters[1:5],con1_1=floor(runif(5,20,35)),con1_2=floor(runif(5,20,35)),con1_3=floor(runif(5,20,35)),con2_1=floor(runif(5,20,35)),con2_2=floor(runif(5,20,35)),con2_3=floor(runif(5,20,35)))
require(reshape2)
melted  <- melt(testdf[1:7],id.vars='name1')
meanbygroup <- aggregate(melted$value, by=list(melted$variable), FUN=mean)
tagoma
  • 3,896
  • 4
  • 38
  • 57