9

Is there a way to select the pmax/pmin of a data frame with multiple columns??

I want only the max or min returned, not the entire row.

max <- tail(df, n=1)
max
#                       v1     v2     v3     v4     v5     v6     v7     v8
#2014-10-03 17:35:00  58.91  45.81  33.06  70.76  36.39  45.53  33.52  34.36

pmax(max)
#                       v1     v2     v3     v4     v5     v6     v7     v8
#2014-10-03 17:35:00  58.91  45.81  33.06  70.76  36.39  45.53  33.52  34.36

For this row, I expect a return value of :

70.76

...as it is the maximum value across all the columns.

Jaap
  • 81,064
  • 34
  • 182
  • 193
user3773444
  • 333
  • 3
  • 12

3 Answers3

13

Use do.call to call pmax to compare all the columns together for each row value, e.g.:

dat <- data.frame(a=1:5,b=rep(3,5))

#  a b
#1 1 3
#2 2 3
#3 3 3
#4 4 3
#5 5 3

do.call(pmax,dat)
#[1] 3 3 3 4 5

When you call pmax on an entire data.frame directly, it only has one argument passed to the function and nothing to compare it to. So, it just returns the supplied argument as it must be the maximum. It works for non-numeric and numeric arguments, even though it may not make much sense:

pmax(7)
#[1] 7

pmax("a")
#[1] "a"

pmax(data.frame(1,2,3))
#  X1 X2 X3
#1  1  2  3

Using do.call(pmax,...) with a data.frame means you pass each column of the data.frame as a list of arguments to pmax:

do.call(pmax,dat) 

is thus equivalent to:

pmax(dat$a, dat$b)
thelatemail
  • 91,185
  • 12
  • 128
  • 188
3

A dplyr way if you like

df <- structure(list(X = c("A", "B", "C", "D", "E", "F", "G", "H", 
                           "I", "J"), Y_1 = c(34, 29, 94, 53, 84, 53, 92, 41, 49, 32), Y_2 = c(43, 
                                                                                               26, 29, 64, 68, 88, 36, 86, 74, 15), Y_3 = c(62, 72, 69, 54, 
                                                                                                                                            80, 49, 51, 41, 46, 68), Y_4 = c(92, 45, 21, 27, 60, 32, 40, 
                                                                                                                                                                             40, 39, 89), Y_5 = c(28, 79, 34, 84, 58, 16, 69, 53, 78, 80), 
                     Y_6 = c(91, 55, 45, 70, 81, 19, 33, 90, 28, 82), Y_7 = c(95, 
                                                                              75, 11, 81, 12, 38, 53, 88, 74, 51), Y_8 = c(69, 99, 44, 
                                                                                                                           20, 53, 57, 79, 45, 21, 47), Y_9 = c(67, 44, 88, 75, 76, 
                                                                                                                                                                70, 18, 80, 32, 83), Y_10 = c(16, 80, 41, 47, 72, 47, 89, 
                                                                                                                                                                                              96, 23, 64)), row.names = c(NA, -10L), class = c("tbl_df", 
                                                                                                                                                                                                                                               "tbl", "data.frame"))
df
#> # A tibble: 10 x 11
#>    X       Y_1   Y_2   Y_3   Y_4   Y_5   Y_6   Y_7   Y_8   Y_9  Y_10
#>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 A        34    43    62    92    28    91    95    69    67    16
#>  2 B        29    26    72    45    79    55    75    99    44    80
#>  3 C        94    29    69    21    34    45    11    44    88    41
#>  4 D        53    64    54    27    84    70    81    20    75    47
#>  5 E        84    68    80    60    58    81    12    53    76    72
#>  6 F        53    88    49    32    16    19    38    57    70    47
#>  7 G        92    36    51    40    69    33    53    79    18    89
#>  8 H        41    86    41    40    53    90    88    45    80    96
#>  9 I        49    74    46    39    78    28    74    21    32    23
#> 10 J        32    15    68    89    80    82    51    47    83    64

library(dplyr)

df %>% rowwise() %>%
  mutate(max_val = max(c_across(where(is.numeric))))
#> # A tibble: 10 x 12
#> # Rowwise: 
#>    X       Y_1   Y_2   Y_3   Y_4   Y_5   Y_6   Y_7   Y_8   Y_9  Y_10 max_val
#>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
#>  1 A        34    43    62    92    28    91    95    69    67    16      95
#>  2 B        29    26    72    45    79    55    75    99    44    80      99
#>  3 C        94    29    69    21    34    45    11    44    88    41      94
#>  4 D        53    64    54    27    84    70    81    20    75    47      84
#>  5 E        84    68    80    60    58    81    12    53    76    72      84
#>  6 F        53    88    49    32    16    19    38    57    70    47      88
#>  7 G        92    36    51    40    69    33    53    79    18    89      92
#>  8 H        41    86    41    40    53    90    88    45    80    96      96
#>  9 I        49    74    46    39    78    28    74    21    32    23      78
#> 10 J        32    15    68    89    80    82    51    47    83    64      89

Created on 2021-05-08 by the reprex package (v2.0.0)

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
3

Thanks to AnilGovyl for the data!

Here is another dplyr option, using rlang and quasiquotation: learned here: dplyr mutate rowwise max of range of columns answer by Ben

First we get all columns to test and the we can use !!! and rlang::syms to compute the parallel max for every row of those columns:

rlang::syms takes a string input (the column names), and turns it into a symbol !!! unquotes and splices its argument, here the column names.

library(dplyr)
library(rlang)

df_cols <- df %>% select(where(is.numeric)) %>% names()

df %>% 
mutate(max_val=pmax(!!!rlang::syms(df_cols)))

Output:

   X       Y_1   Y_2   Y_3   Y_4   Y_5   Y_6   Y_7   Y_8   Y_9  Y_10 max_val
   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
 1 A        34    43    62    92    28    91    95    69    67    16      95
 2 B        29    26    72    45    79    55    75    99    44    80      99
 3 C        94    29    69    21    34    45    11    44    88    41      94
 4 D        53    64    54    27    84    70    81    20    75    47      84
 5 E        84    68    80    60    58    81    12    53    76    72      84
 6 F        53    88    49    32    16    19    38    57    70    47      88
 7 G        92    36    51    40    69    33    53    79    18    89      92
 8 H        41    86    41    40    53    90    88    45    80    96      96
 9 I        49    74    46    39    78    28    74    21    32    23      78
10 J        32    15    68    89    80    82    51    47    83    64      89
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    You may also use `max.col` to get `colnames` where maximum value is – AnilGoyal May 08 '21 at 09:44
  • 1
    You can make this more concise by doing the select from the pipe `df %>% mutate(max_val = pmax(!!!select(., where(is.numeric))))` – stlba May 10 '23 at 17:25