Testing whether values across multiple columns are the same using dplyr

Question

I have some columns in a dataframe that are character variables. The below gives a two-row example of what the columns I’m interested in might look like:

a <- rep('Agree', 20)
b <- rep(c('Disagree', 'Agree'), 10)
dat <- data.frame(rbind(a,b), stringsAsFactors = FALSE)

I want to identify all the rows where each of the columns has the same value. For example, using dplyr mutate, I would like to create a new variable called ‘allSame’ where the value in the first row of 'dat' would be ‘yes’ and the value in the second row would be ‘no’.

I would also like to do this indexing the columns by number rather than name, because some of the variables have very long names and there are multiple sets of columns in the dataframe that I’d like to do this for.

`dat` is a matrix, not a data frame. Which class do you intend to use? If you want to use a data frame, `dat <- data.frame(a = a, b = b, stringsAsFactors = FALSE)` is one way to create a data frame. — jazzurro, Dec 03 '17 at 07:38
@jazzurro Thanks for pointing out that mistake, I have edited to clarify that I am working with a data frame — userLL, Dec 03 '17 at 07:50
You have a wide format of data right now. So you have 20 columns. Is this correct? — jazzurro, Dec 03 '17 at 08:05
Yes, the data are in wide format. 20 columns was an example, there are multiple sets of columns in the data that I would like to use this approach with. — userLL, Dec 03 '17 at 08:31
Related: [Count the number of rows where all columns have identical values](https://stackoverflow.com/questions/45948770/count-the-number-of-rows-where-all-columns-have-identical-values) (just skip the `sum` part). — Henrik, Dec 03 '17 at 10:28

score 1 · Accepted Answer · answered Dec 03 '17 at 08:49

The following is one way to check if you have a same answer (i.e., all Agree or all Disagree). I created a minimal sample and did the following. You want to check if each row has "Agree" or "Disagree" only. You can use logical check. mydf == "Agree" returns a matrix with T or F. Using rowSums(), you can calculate how many times you have T in each row. If the outcome is equal to ncol(mydf), which is 3 in this case, you have "Agree" only. If you have 0, you have "Disagree" only. I guess you want these cases for yes. TRUE in allSame means yes.

mydf <- data.frame(col1 = c("Agree", "Agree", "Disagree"),
                   col2 = c("Agree", "Disagree", "Disagree"),
                   col3 = c("Agree", "Disagree", "Disagree"),
                   stringsAsFactors = FALSE)

#      col1     col2     col3
#1    Agree    Agree    Agree
#2    Agree Disagree Disagree
#3 Disagree Disagree Disagree

mydf %>%
mutate(allSame = (rowSums(mydf == "Agree") == 0 |
                  rowSums(mydf == "Agree") == ncol(mydf)))

#      col1     col2     col3 allSame
#1    Agree    Agree    Agree    TRUE
#2    Agree Disagree Disagree   FALSE
#3 Disagree Disagree Disagree    TRUE

Given the above, you would do:

dat %>%
mutate(allSame = (rowSums(dat == "Agree") == 0 |
                  rowSums(dat == "Agree") == ncol(dat)))

struggles · Answer 2 · 2017-12-03T09:02:52.913

0

use sapply if you want to iterate through each row independently. It might be worth looking up functionals

a <- rep('Agree', 20)
b <- rep(c('Disagree', 'Agree'), 10)
df <- data.frame(a, b, stringsAsFactors =F)


df <- mutate(df, same = sapply(1:nrow(df), function(i){
  if(a[i] == b[i]){'yes'} else {'no'}
}))

renaming should use names

names(df) <- paste0('index_', 1:length(names(df))

edited Dec 03 '17 at 09:02

answered Dec 03 '17 at 08:15

struggles

825
5
10

I am afraid your `df` has only two columns. That is not correct according to the OP. – jazzurro Dec 03 '17 at 08:38

Testing whether values across multiple columns are the same using dplyr

2 Answers2

Linked