2

I am hoping to find a vectorized approach to get the absolute maximum value from multiple columns in a data frame.

Basically is there an equivalent to the pmax function for getting absolute maximums.

test_df <- tibble(
  some_identifier = c("apple", "tunafish", "turkey_sandwich"), 
  val_a =  c(-1, 2, 0), 
  val_b = c(-3, 3, NA), 
  val_c = c(2, 3, 1)

)

# this is what abs_max column should be 
test_df$abs_max <- c(-3, 3, 1)
test_df

# A tibble: 3 x 5
  some_identifier val_a val_b val_c abs_max
  <chr>           <dbl> <dbl> <dbl>   <dbl>
1 apple              -1    -3     2      -3
2 tunafish            2     3     3       3
3 turkey_sandwich     0    NA     1       1

The abs_max column is what I want to create. A less than optimal solution may be to loop through each row; but wanted to reach out to identify possible a better method.

markus
  • 25,843
  • 5
  • 39
  • 58
cavamic
  • 69
  • 7

1 Answers1

3

Here is a way using max.col - thanks to @Gregor

f <- function(data) {
  tmp <- Filter(is.numeric, data)
  if(inherits(data, "tbl_df")) {
    tmp <- as.matrix(tmp)
  }
  tmp[cbind(1:nrow(tmp),
            max.col(replace(x <- abs(tmp), is.na(x), -Inf)))]
}

f(test_df)
# [1] -3  3  1

step by step

What we do is filter for numeric columns in the first step

Filter(is.numeric, test_df)
#  val_a val_b val_c
#1    -1    -3     2
#2     2     3     3
#3     0    NA     1

(called tmp in the function above)

Then

replace(x <- abs(Filter(is.numeric, test_df)), is.na(x), -Inf))

returns

#  val_a val_b val_c
#1     1     3     2
#2     2     3     3
#3     0  -Inf     1

that is a data.frame where NAs were replaced with -Inf and all negative values were replaced with their absolute value.

max.col returns the column position of the maximum values for each row

max.col(replace(x <- abs(Filter(is.numeric, test_df)), is.na(x), -Inf))
# [1] 2 2 3

This information is finally being used to extract the desired values from Filter(is.numeric, test_df) using a numeric matrix, i.e.

cbind(1:nrow(Filter(is.numeric, test_df)),
      max.col(replace(x <- abs(Filter(is.numeric, test_df)), is.na(x), -Inf)))
#     [,1] [,2]
#[1,]    1    2
#[2,]    2    2
#[3,]    3    3

data

test_df <- data.frame(
  some_identifier = c("apple", "tunafish", "turkey_sandwich"), 
  val_a =  c(-1, 2, 0), 
  val_b = c(-3, 3, NA), 
  val_c = c(2, 3, 1), stringsAsFactors = FALSE)
markus
  • 25,843
  • 5
  • 39
  • 58
  • 1
    Thank you - this is the type of practical solution I was looking for. I wasn't aware of the max.col function. Tibble's behavior is unexpected but good to know (I typically work in tidyverse). ```> mtcars[cbind(1:3, 4:6)] [1] 110.00 3.90 2.32 > dplyr::as_tibble(mtcars)[cbind(1:3, 4:6)] Error: Must use a vector in `[`, not an object of class matrix. ``` – cavamic Aug 16 '19 at 10:55