Return pmin or pmax of data.frame with multiple columns

Question

Is there a way to select the pmax/pmin of a data frame with multiple columns??

I want only the max or min returned, not the entire row.

max <- tail(df, n=1)
max
#                       v1     v2     v3     v4     v5     v6     v7     v8
#2014-10-03 17:35:00  58.91  45.81  33.06  70.76  36.39  45.53  33.52  34.36

pmax(max)
#                       v1     v2     v3     v4     v5     v6     v7     v8
#2014-10-03 17:35:00  58.91  45.81  33.06  70.76  36.39  45.53  33.52  34.36

For this row, I expect a return value of :

70.76

...as it is the maximum value across all the columns.

thelatemail · Accepted Answer · 2014-10-09T01:11:37.133

13

Use do.call to call pmax to compare all the columns together for each row value, e.g.:

dat <- data.frame(a=1:5,b=rep(3,5))

#  a b
#1 1 3
#2 2 3
#3 3 3
#4 4 3
#5 5 3

do.call(pmax,dat)
#[1] 3 3 3 4 5

When you call pmax on an entire data.frame directly, it only has one argument passed to the function and nothing to compare it to. So, it just returns the supplied argument as it must be the maximum. It works for non-numeric and numeric arguments, even though it may not make much sense:

pmax(7)
#[1] 7

pmax("a")
#[1] "a"

pmax(data.frame(1,2,3))
#  X1 X2 X3
#1  1  2  3

Using do.call(pmax,...) with a data.frame means you pass each column of the data.frame as a list of arguments to pmax:

do.call(pmax,dat)

is thus equivalent to:

pmax(dat$a, dat$b)

edited Oct 09 '14 at 01:11

answered Oct 09 '14 at 00:24

thelatemail

91,185
12
128
188

This does not work with NAs. How can I pass na.rm = TRUE to do.call? – JASC May 19 '19 at 23:23
1

@JASC - `do.call(pmax, c(dat, na.rm=TRUE))` – thelatemail May 20 '19 at 00:25

score 3 · Answer 2 · answered May 08 '21 at 08:49

A dplyr way if you like

df <- structure(list(X = c("A", "B", "C", "D", "E", "F", "G", "H", 
                           "I", "J"), Y_1 = c(34, 29, 94, 53, 84, 53, 92, 41, 49, 32), Y_2 = c(43, 
                                                                                               26, 29, 64, 68, 88, 36, 86, 74, 15), Y_3 = c(62, 72, 69, 54, 
                                                                                                                                            80, 49, 51, 41, 46, 68), Y_4 = c(92, 45, 21, 27, 60, 32, 40, 
                                                                                                                                                                             40, 39, 89), Y_5 = c(28, 79, 34, 84, 58, 16, 69, 53, 78, 80), 
                     Y_6 = c(91, 55, 45, 70, 81, 19, 33, 90, 28, 82), Y_7 = c(95, 
                                                                              75, 11, 81, 12, 38, 53, 88, 74, 51), Y_8 = c(69, 99, 44, 
                                                                                                                           20, 53, 57, 79, 45, 21, 47), Y_9 = c(67, 44, 88, 75, 76, 
                                                                                                                                                                70, 18, 80, 32, 83), Y_10 = c(16, 80, 41, 47, 72, 47, 89, 
                                                                                                                                                                                              96, 23, 64)), row.names = c(NA, -10L), class = c("tbl_df", 
                                                                                                                                                                                                                                               "tbl", "data.frame"))
df
#> # A tibble: 10 x 11
#>    X       Y_1   Y_2   Y_3   Y_4   Y_5   Y_6   Y_7   Y_8   Y_9  Y_10
#>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 A        34    43    62    92    28    91    95    69    67    16
#>  2 B        29    26    72    45    79    55    75    99    44    80
#>  3 C        94    29    69    21    34    45    11    44    88    41
#>  4 D        53    64    54    27    84    70    81    20    75    47
#>  5 E        84    68    80    60    58    81    12    53    76    72
#>  6 F        53    88    49    32    16    19    38    57    70    47
#>  7 G        92    36    51    40    69    33    53    79    18    89
#>  8 H        41    86    41    40    53    90    88    45    80    96
#>  9 I        49    74    46    39    78    28    74    21    32    23
#> 10 J        32    15    68    89    80    82    51    47    83    64

library(dplyr)

df %>% rowwise() %>%
  mutate(max_val = max(c_across(where(is.numeric))))
#> # A tibble: 10 x 12
#> # Rowwise: 
#>    X       Y_1   Y_2   Y_3   Y_4   Y_5   Y_6   Y_7   Y_8   Y_9  Y_10 max_val
#>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
#>  1 A        34    43    62    92    28    91    95    69    67    16      95
#>  2 B        29    26    72    45    79    55    75    99    44    80      99
#>  3 C        94    29    69    21    34    45    11    44    88    41      94
#>  4 D        53    64    54    27    84    70    81    20    75    47      84
#>  5 E        84    68    80    60    58    81    12    53    76    72      84
#>  6 F        53    88    49    32    16    19    38    57    70    47      88
#>  7 G        92    36    51    40    69    33    53    79    18    89      92
#>  8 H        41    86    41    40    53    90    88    45    80    96      96
#>  9 I        49    74    46    39    78    28    74    21    32    23      78
#> 10 J        32    15    68    89    80    82    51    47    83    64      89

^{Created on 2021-05-08 by the reprex package (v2.0.0)}

score 3 · Answer 3 · answered May 08 '21 at 09:20

Thanks to AnilGovyl for the data!

Here is another dplyr option, using rlang and quasiquotation: learned here: dplyr mutate rowwise max of range of columns answer by Ben

First we get all columns to test and the we can use !!! and rlang::syms to compute the parallel max for every row of those columns:

rlang::syms takes a string input (the column names), and turns it into a symbol !!! unquotes and splices its argument, here the column names.

library(dplyr)
library(rlang)

df_cols <- df %>% select(where(is.numeric)) %>% names()

df %>% 
mutate(max_val=pmax(!!!rlang::syms(df_cols)))

Output:

   X       Y_1   Y_2   Y_3   Y_4   Y_5   Y_6   Y_7   Y_8   Y_9  Y_10 max_val
   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
 1 A        34    43    62    92    28    91    95    69    67    16      95
 2 B        29    26    72    45    79    55    75    99    44    80      99
 3 C        94    29    69    21    34    45    11    44    88    41      94
 4 D        53    64    54    27    84    70    81    20    75    47      84
 5 E        84    68    80    60    58    81    12    53    76    72      84
 6 F        53    88    49    32    16    19    38    57    70    47      88
 7 G        92    36    51    40    69    33    53    79    18    89      92
 8 H        41    86    41    40    53    90    88    45    80    96      96
 9 I        49    74    46    39    78    28    74    21    32    23      78
10 J        32    15    68    89    80    82    51    47    83    64      89

You may also use `max.col` to get `colnames` where maximum value is — AnilGoyal, May 08 '21 at 09:44
You can make this more concise by doing the select from the pipe `df %>% mutate(max_val = pmax(!!!select(., where(is.numeric))))` — stlba, May 10 '23 at 17:25

Return pmin or pmax of data.frame with multiple columns

3 Answers3

Linked

Related