Select a maximum value across rows and columns with grouped data

Question

The data below have an IndID field as well as three columns containing numbers, including NA in some instances, with a varying number of rows for each IndID.

library(dplyr)
n = 10
set.seed(123)
dat <- data.frame(IndID = sample(c("AAA", "BBB", "CCC", "DDD"), n, replace = T),
                  Num1 = c(2,4,2,4,4,1,3,4,3,2),
                  Num2 = sample(c(1,2,5,8,7,8,NA), n, replace = T),
                  Num3 = sample(c(NA, NA,NA,8,7,9,NA), n, replace = T)) %>%
  arrange(IndID)

 head(dat)
  IndID Num1 Num2 Num3
1   AAA    1   NA    7
2   BBB    2   NA   NA
3   BBB    2    7    7
4   BBB    2   NA   NA
5   CCC    3    2    8
6   CCC    3    5   NA

For each IndID, I would like to make a new column Max that contains the maximum value for Num1:Num3. In most instances this involves finding the max value across multiple rows and columns. Within dplyr I am missing the final step (below) and would appreciate any suggestions.

dat %>%
  group_by(IndID) %>%
  mutate(Max = "???")

akrun · Accepted Answer · 2018-07-29T21:33:23.440

An option is pmax to get the rowwise maxs

dat %>%        
   mutate(Max = pmax(Num1, Num2, Num3, na.rm = TRUE))

If there are many columns, we can get the column names, convert it to symbol and then evaluate (!!!)

dat %>%        
    mutate(Max = pmax(!!! rlang::syms(names(.)[-1]), na.rm = TRUE))
# A tibble: 10 x 5
# Groups:   IndID [4]
#   IndID  Num1  Num2  Num3   Max
#   <fct> <dbl> <dbl> <dbl> <dbl>
# 1 AAA       1    NA     7     7
# 2 BBB       2    NA    NA     2
# 3 BBB       2     7     7     7
# 4 BBB       2    NA    NA     2
# 5 CCC       3     2     8     8
# 6 CCC       3     5    NA     5
# 7 DDD       4     8     7     8
# 8 DDD       4     7    NA     7
# 9 DDD       4     1     7     7
#10 DDD       4     1     7     7

If this is to get the max of all 'Num' column grouped by 'IndID', there are multiple ways.

1) From the above step, we can extend it to group by 'IndID' and then take the max of row maxs ('Max')

dat %>%        
    mutate(Max = pmax(!!! rlang::syms(names(.)[-1]), na.rm = TRUE)) %>% 
    group_by(IndID) %>% 
    mutate(Max = max(Max))

2) Another option is to convert the 'wide' format to 'long' with gather, then grouped by 'IndID', get the max of 'val' column and right_join with the original dataset

library(tidyverse)
gather(dat, key, val, -IndID) %>% 
    group_by(IndID) %>% 
    summarise(Max = max(val,na.rm = TRUE)) %>% 
    right_join(dat)

3) Or another option without reshaping into 'long' format would be to nest the dataset after grouping by 'IndID', unlist and get the max of the 'Num' columns

dat %>%
   group_by(IndID) %>%
   nest %>%
   mutate(data =  map(data,  ~ .x %>% 
                                mutate(Max = max(unlist(.), na.rm = TRUE)))) %>% 
   unnest

Select a maximum value across rows and columns with grouped data

1 Answers1

Linked