1

I have a very large data set with results and dates. A small subset of the data (I have many more columns with different names and rows):

  result_1    date_1 result_2    date_2 result_3    date_3 result_4    date_4
1        1 12.8.2020        4 13.8.2020        2 15.8.2020        1 20.8.2020
2        3 15.8.2020        3 14.8.2020        5 17.8.2020        2 21.8.2020

I want to add a maximum column and the maximum name column for the result columns, depending on the column name (I'm using regex since not all the columns are properly named) I've tried a few options, the one that seems to work is to create another data frame while selecting columns that contain a regex that I choose. I tried something similar to the following code:

data_max <- data %>% 
  select(matches("result_")) %>% 
  rowwise() %>% 
  mutate(max = max(.))

My idea was to then join the max column with the original data and find the column name but I'm sure there's a better way to do it, especially since my data contains other column names (different regex) which I also want to maximize by row and plenty of na's in some of the columns.

Amidavid
  • 177
  • 7

1 Answers1

1

You can select 'result' column and use max.col :

cols <- grep('result', names(df), value = TRUE)
df$max_column <- cols[max.col(df[cols], ties.method = 'first')]
df

#  result_1    date_1 result_2    date_2 result_3    date_3 result_4    date_4 max_column
#1        1 12.8.2020        4 13.8.2020        2 15.8.2020        1 20.8.2020   result_2
#2        3 15.8.2020        3 14.8.2020        5 17.8.2020        2 21.8.2020   result_3

This gives the column name of maximum value in each row for 'result' columns.

data

df <- structure(list(result_1 = c(1L, 3L), date_1 = c("12.8.2020", 
"15.8.2020"), result_2 = 4:3, date_2 = c("13.8.2020", "14.8.2020"
), result_3 = c(2L, 5L), date_3 = c("15.8.2020", "17.8.2020"), 
    result_4 = 1:2, date_4 = c("20.8.2020", "21.8.2020")), 
class = "data.frame", row.names = c(NA, -2L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213