32

I have a dataframe as below. I want to get a column of maximums for each row. But that column should ignore value 9 if it is present in that row. How can i achive that efficiently?

df <- data.frame(age=c(5,6,9), marks=c(1,2,7), story=c(2,9,1))
df$max <- apply(df, 1, max)    
df
Henrik
  • 65,555
  • 14
  • 143
  • 159
user2543622
  • 5,760
  • 25
  • 91
  • 159

4 Answers4

26

Here's one possibility:

df$colMax <- apply(df, 1, function(x) max(x[x != 9]))
talat
  • 68,970
  • 21
  • 126
  • 157
  • 1
    Well, it does create extra copies. You could wrap the other answers in functions to hide that a copy is created. – Roland Jun 30 '14 at 19:19
  • @Roland ok, corrected. (I meant you don't end up with another data.frame and you don't need to define an extra function which removes it) – talat Jun 30 '14 at 19:21
  • **Side Note:** Using `vapply` offers significant boost especially for large DFs (i.e. `vapply(df, function(x) max(x[x != 9]), numeric(1))`) – user101 Feb 09 '20 at 10:48
22

The pmax function would be useful here. The only catch is that it takes a bunch of vectors as parameters. You can convert a data.frame to parameters with do.call. I also set the 9 values to NA as suggested by other but do so using the somewhat unconventional is.na<- command.

do.call(pmax, c(`is.na<-`(df, df==9), na.rm=T))
# [1] 5 6 7
MrFlick
  • 195,160
  • 17
  • 277
  • 295
4

Substitute 9 with NA and then use pmax as suggested by @MrFlick in his deleted answer:

df2 <- df #copy df because we are going to change it
df2[df2==9] <- NA
do.call(function(...) pmax(..., na.rm=TRUE), df2)
#[1] 5 6 7
Roland
  • 127,288
  • 10
  • 191
  • 288
  • 1
    #or do.call(`pmax`, c(df2, na.rm=TRUE)) #[1] 5 6 7 – akrun Jun 30 '14 at 19:26
  • 1
    Why should we prefer pmax over max? – russellpierce Jul 02 '14 at 12:33
  • 1
    @rpierce I didn't say that, did I? These functions do different things. – Roland Jul 10 '14 at 10:47
  • 1
    You didn't. They just both seem to being used for the same purpose here and I'm not a math guy who can interpret the help file to tell the difference nor am I a trained programmer who can intuit which function is more efficient. Thus, I asked. – russellpierce Jul 10 '14 at 11:01
  • 1
    `do.call(pmax, DF)` gives the same result as `apply(DF, 1, max)` (provided `DF` is a data.frame with all numeric columns), but is faster by a factor of 100 on a data.frame with 2 columns and 1e4 rows. If you don't understand the documentation look at the examples section and play around with the function. – Roland Jul 10 '14 at 11:10
2
#make a copy of your data.frame
tmp.df <- df
#replace the 9s with NA
tmp.df[tmp.df==9] <- NA
#Use apply to process the data one row at a time through the max function, removing NA values first
apply(tmp.df,1,max,na.rm=TRUE)
russellpierce
  • 4,583
  • 2
  • 32
  • 44