Mean with condition for multiple columns in r

Question

Let's use mtcars to explain the situation.

What I want to do is the same below for multiple columns. To have the mean of a column qsec (in the example) regarding another column with a specific value (4 and 6, in the example below). I'll compare the result later so maybe I would store the results in a vector

table(mtcars$cyl)
4  6  8
11  7 14

mean(mtcars$qsec[mtcars$cyl == 4], na.rm = T)
mean(mtcars$qsec[mtcars$gear == 4], na.rm = T)

I would like to check the means of qsec regarding the cyl, and let's say gear and carb, with the same "pattern" for the mean i.e. mean of observations with 4 and mean of observations 6. In the true dataset would be several columns that have the same set of numbers (2, 0 and 1). I'll compare the means of a column (in the example qsec) with observations 2 and 0.

I've tried to look at the functions like tapply, apply, sapply. But I'm stuck in having the condition in the mean applying for every column (at once).

Hope I made myself clear. Thank you!

So, to clarify, you want to know how to find the means of `qsec`, `gear`, and `carb` where `cyl` equals `4` or `6`? — phalteman, Jun 14 '18 at 21:53
Hi @phalteman, what I want is to have the means of `qsec` for the columns `gear`,`carb`,`cyl`, when they have number 4 in their observation. So it would be the mean of `qsec` when the observation in `gear` is equal 4. Then, another calc of men of `qsec` when the `gear` is 4, and so on.. for multiple columns. — ivan lange, Jun 15 '18 at 13:42

Karolis Koncevičius · Answer 1 · 2018-06-14T22:14:34.933

5

The function you are looking for is aggregate:

aggregate(. ~ cyl, FUN=mean, data=mtcars[,c("cyl", "qsec", "gear", "carb")], 
          subset=cyl %in% c(4, 6)
          )

  cyl     qsec     gear     carb
1   4 19.13727 4.090909 1.545455
2   6 17.97714 3.857143 3.428571

In the function above data= is the data.frame. Here we only selected the wanted columns. And the subset= specifies which rows of the data to keep (in this case only cyl 4 and 6).

The formula . ~ cyl instructs to summarise all columns according to the cyl column.

edited Jun 14 '18 at 22:14

answered Jun 14 '18 at 21:59

Karolis Koncevičius

9,417
9
56
89

1

In the subset, just use `cyl` instead of `mtcars$cyl` – Onyambu Jun 14 '18 at 22:12
Thank you for the answer. What I want is exactly this but in a way to do it for multiple columns at once like this: `aggregate(. ~ cyl, FUN=mean, data=mtcars[,c("cyl", "qsec")], subset=cyl %in% c(4, 6) )` First according `cyl`. Then, according `gear`, and so on... `aggregate(. ~ gear, FUN=mean, data=mtcars[,c("gear", "qsec")], subset=gear %in% c(4, 6) )` Do you have an idea about how to apply this function in for more columns at once? I have several columns and would be great to find a way to do exactly this in a automatically way. – ivan lange Jun 15 '18 at 14:05

score 1 · Answer 2 · answered Jun 16 '18 at 03:44

1

a data.table solution:

require(data.table)

mtcars[cyl %in% c(4, 6), .(mn_qsec = mean(qsec), 
                           mn_gear = mean(gear), 
                           mn_carb = mean(carb)), 
                      by = cyl]

answered Jun 16 '18 at 03:44

PavoDive

6,322
2
29
55

perfect solution – rane Sep 04 '20 at 05:29

score 0 · Answer 3 · answered Jun 14 '18 at 22:13

On option is to use dplyr::mutate_at as OP wants to apply same function on multiple column. The solution will be as:

library(dplyr)
mtcars %>%
  group_by(cyl) %>%
  summarise_at(vars(c("qsec", "gear", "carb")), funs(mean), na.rm = TRUE) %>%
  filter(cyl!=8)


# # A tibble: 2 x 4
# cyl  qsec  gear  carb
# <dbl> <dbl> <dbl> <dbl>
# 1  4.00  19.1  4.09  1.55
# 2  6.00  18.0  3.86  3.43

score 0 · Accepted Answer · answered Jun 16 '18 at 00:09

What I understand you're looking for is the mean of qsec for each level of cyl, gear, and carb separately, not in combination. This code gets you that, but doesn't directly let you select specific levels of those factors. If you need to be able to do that second part, I think you should be able to tweak this to get there, but I'm not sure how...

apply(mtcars[,c("cyl","gear","carb")], 2, function(x) {
  aggregate(mtcars[,"qsec"],list(x),mean)
})

Output:

$cyl
  Group.1        x
1       4 19.13727
2       6 17.97714
3       8 16.77214

$gear
  Group.1      x
1       3 17.692
2       4 18.965
3       5 15.640

$carb
  Group.1        x
1       1 19.50714
2       2 18.18600
3       3 17.66667
4       4 16.96500
5       6 15.50000
6       8 14.60000

Mean with condition for multiple columns in r

4 Answers4