Table with function in R

Question

I need to do some descriptive statistic on a dataset. I need to create a table from a dataset that give me, for each level in a factor the mean of another variable.

city   mean(age) 
 1       14    
 2       15    
 3       23    
 4       34

Which is the fastest way to do it in R?

Another thing that I have to do is the same thing, but on 2 dimensions:

mean(age)   male   female 
 city      
 1          12       13     
 2          15       16
 3          21       22
 4          34       33

And I wonder if there is also the possibility to apply also other functions like max, min,sum....

Edit: I add a dataset to create examples easier:

data.frame(years=rep(c(12,13,14,15,15,16,34,67,45,78,17,42),2),sex=rep(c("M","F"),12),city=rep(c(1,2,3,4,4,3,2,1),3))

Thank you! Aggregate could be a solution to create a database with the information that I need. This solve my first step. But to create a a cross table with the function? — dax90, Jul 25 '14 at 09:33
@Roland, `aggregate` is certainly not the fates way :) even `tapply` is faster — David Arenburg, Jul 25 '14 at 10:46
@user3384159, there are many ways to do this but please provide a data set, not only the desired output — David Arenburg, Jul 25 '14 at 10:47

David Arenburg · Accepted Answer · 2014-07-25T11:20:50.610

2

Could try (added data.table package for faster dcast on big data sets)

library(data.table)
library(reshape2)
dcast.data.table(setDT(dato), city ~ sex, value.var = "years", fun = mean)

#    city        F        M
# 1:    1 41.33333 24.00000
# 2:    2 35.66667 21.66667
# 3:    3 35.66667 21.66667
# 4:    4 41.33333 24.00000

You could also just use data.table in a regular way

dato <- setkey(setDT(dato)[, list(mean = mean(years)), by = list(city, sex)])

#    city sex     mean
# 1:    1   F 41.33333
# 2:    1   M 24.00000
# 3:    2   F 35.66667
# 4:    2   M 21.66667
# 5:    3   F 35.66667
# 6:    3   M 21.66667
# 7:    4   F 41.33333
# 8:    4   M 24.00000

Or dplyr package (also very fast)

library(dplyr)
dato %>%
  group_by(city, sex) %>%
      summarize(mean(years))

#   city sex mean(years)
# 1    1   F    41.33333
# 2    1   M    24.00000
# 3    2   F    35.66667
# 4    2   M    21.66667
# 5    3   F    35.66667
# 6    3   M    21.66667
# 7    4   F    41.33333
# 8    4   M    24.00000

edited Jul 25 '14 at 11:20

answered Jul 25 '14 at 11:05

David Arenburg

91,361
17
137
196

+1 for seeing you post a `dplyr` answer (including it at least) :) – talat Jul 25 '14 at 11:37
@beginneR, I knew you are online, so had to write it too so the OP won't say something like "Oh I like that nice syntax of dplyr blabla" and accept your answer. Happens to me all the time – David Arenburg Jul 25 '14 at 11:40
Haha, that's interesting isn't it? Also the reason I started learning `dplyr` - for a newbie it's easier to understand. – talat Jul 25 '14 at 11:44

talat · Answer 2 · 2014-07-25T11:58:22.463

Since you also asked how to apply a larger number of functions to one or many columns: you can do that easily with dplyr like this:

library(dplyr)

dato %>%
  group_by(city, sex) %>%
  summarise_each(funs(mean, min, max, sum))

#Source: local data frame [8 x 6]
#Groups: city
#
#  city sex     mean min max sum
#1    1   F 41.33333  15  67 124
#2    1   M 24.00000  12  45  72
#3    2   F 35.66667  13  78 107
#4    2   M 21.66667  14  34  65
#5    3   F 35.66667  13  78 107
#6    3   M 21.66667  14  34  65
#7    4   F 41.33333  15  67 124
#8    4   M 24.00000  12  45  72

This will apply the defined functions to all columns except the grouping variables (city, sex) by default. Since you only have three columns, the functions are only applied to the age column. You can also specify either which columns you want to apply the functions to or which you want to exclude from it, by changing the summarise_each to either

summarise_each(funs(mean, min, max, sum), c(col1, col2))  # include only col1 and col2
summarise_each(funs(mean, min, max, sum), -c(col2, col3)) # exclude col2 and col3

Table with function in R

2 Answers2