-2

I need to do some descriptive statistic on a dataset. I need to create a table from a dataset that give me, for each level in a factor the mean of another variable.

city   mean(age) 
 1       14    
 2       15    
 3       23    
 4       34    

Which is the fastest way to do it in R?

Another thing that I have to do is the same thing, but on 2 dimensions:

mean(age)   male   female 
 city      
 1          12       13     
 2          15       16
 3          21       22
 4          34       33

And I wonder if there is also the possibility to apply also other functions like max, min,sum....

Edit: I add a dataset to create examples easier:

data.frame(years=rep(c(12,13,14,15,15,16,34,67,45,78,17,42),2),sex=rep(c("M","F"),12),city=rep(c(1,2,3,4,4,3,2,1),3))  
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
dax90
  • 1,088
  • 14
  • 29

2 Answers2

2

Could try (added data.table package for faster dcast on big data sets)

library(data.table)
library(reshape2)
dcast.data.table(setDT(dato), city ~ sex, value.var = "years", fun = mean)

#    city        F        M
# 1:    1 41.33333 24.00000
# 2:    2 35.66667 21.66667
# 3:    3 35.66667 21.66667
# 4:    4 41.33333 24.00000

You could also just use data.table in a regular way

dato <- setkey(setDT(dato)[, list(mean = mean(years)), by = list(city, sex)])

#    city sex     mean
# 1:    1   F 41.33333
# 2:    1   M 24.00000
# 3:    2   F 35.66667
# 4:    2   M 21.66667
# 5:    3   F 35.66667
# 6:    3   M 21.66667
# 7:    4   F 41.33333
# 8:    4   M 24.00000

Or dplyr package (also very fast)

library(dplyr)
dato %>%
  group_by(city, sex) %>%
      summarize(mean(years))

#   city sex mean(years)
# 1    1   F    41.33333
# 2    1   M    24.00000
# 3    2   F    35.66667
# 4    2   M    21.66667
# 5    3   F    35.66667
# 6    3   M    21.66667
# 7    4   F    41.33333
# 8    4   M    24.00000
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • +1 for seeing you post a `dplyr` answer (including it at least) :) – talat Jul 25 '14 at 11:37
  • @beginneR, I knew you are online, so had to write it too so the OP won't say something like "Oh I like that nice syntax of dplyr blabla" and accept your answer. Happens to me all the time – David Arenburg Jul 25 '14 at 11:40
  • Haha, that's interesting isn't it? Also the reason I started learning `dplyr` - for a newbie it's easier to understand. – talat Jul 25 '14 at 11:44
1

Since you also asked how to apply a larger number of functions to one or many columns: you can do that easily with dplyr like this:

library(dplyr)

dato %>%
  group_by(city, sex) %>%
  summarise_each(funs(mean, min, max, sum))

#Source: local data frame [8 x 6]
#Groups: city
#
#  city sex     mean min max sum
#1    1   F 41.33333  15  67 124
#2    1   M 24.00000  12  45  72
#3    2   F 35.66667  13  78 107
#4    2   M 21.66667  14  34  65
#5    3   F 35.66667  13  78 107
#6    3   M 21.66667  14  34  65
#7    4   F 41.33333  15  67 124
#8    4   M 24.00000  12  45  72

This will apply the defined functions to all columns except the grouping variables (city, sex) by default. Since you only have three columns, the functions are only applied to the age column. You can also specify either which columns you want to apply the functions to or which you want to exclude from it, by changing the summarise_each to either

summarise_each(funs(mean, min, max, sum), c(col1, col2))  # include only col1 and col2
summarise_each(funs(mean, min, max, sum), -c(col2, col3)) # exclude col2 and col3
talat
  • 68,970
  • 21
  • 126
  • 157