2

I have the following table:

col1 col2 col3 col4
1 2 1 4
5 6 6 3

My goal is to find the max value per each row, and then find how many times it was repeated in the same row.

The resulting table should look like this:

col1 col2 col3 col4 max_val repetition
1 2 1 4 4 1
5 6 6 3 6 2

Now to achieve this, I am doing the following for Max:

df%>% rowwise%>%
mutate(max=max(col1:col4))

However, I am struggling to find the repetition. My idea is to use this pseudo code in mutate: sum( "select current row entirely or only for some columns"==max). But I don't know how to select entire row or only some columns of it and use its content to do the check, i.e.: is it equal to the max. How can we do this in dplyr?

GitZine
  • 445
  • 3
  • 14
  • 1
    Does this answer your question? [dplyr mutate rowwise max of range of columns](https://stackoverflow.com/questions/32978458/dplyr-mutate-rowwise-max-of-range-of-columns) – Clemsang Dec 02 '22 at 15:43
  • It only partially answers the question as my goal here is not actually to calculate the max but to find a way to select all/specific columns in rowwise operations. – GitZine Dec 02 '22 at 16:09

3 Answers3

4

A dplyr approach:

library(dplyr)
df %>% 
  rowwise() %>% 
  mutate(max_val = max(across(everything())),
         repetition = sum(across(col1:col4) == max_val))

# A tibble: 2 × 6
# Rowwise: 
   col1  col2  col3  col4 max_val repetition
  <int> <int> <int> <int>   <int>      <int>
1     1     2     1     4       4          1
2     5     6     6     3       6          2

An R base approach:

df$max_val <- apply(df,1,max)
df$repetition <- rowSums(df[, 1:4] == df[, 5])
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
1

For other (non-tidyverse) readers, a base R approach could be:

df$max_val  <- apply(df, 1, max)
df$repetition <- apply(df, 1, function(x) sum(x[1:4] == x[5]))

Output:

#   col1 col2 col3 col4 max_val repetition
# 1    1    2    1    4       4          1
# 2    5    6    6    3       6          2
jpsmith
  • 11,023
  • 5
  • 15
  • 36
1

Although dplyr has added many tools for working across rows of data, it remains, in my mind at least, much easier to adhere to tidy principles and always convert the data to "long" format for these kinds of operations.

Thus, here is a tidy approach:

df %>%
    mutate(row = row_number()) %>%
    pivot_longer(cols = -row) %>%
    group_by(row) %>%
    mutate(max_val = max(value), repetitions = sum(value == max(value))) %>%
    pivot_wider(id_cols = c(row, max_val, repetitions)) %>%
    select(col1:col4, max_val, repetitions)

The last select() is just to get the columns in the order you want.

Curt F.
  • 4,690
  • 2
  • 22
  • 39