2

I have some columns in a dataframe that are character variables. The below gives a two-row example of what the columns I’m interested in might look like:

a <- rep('Agree', 20)
b <- rep(c('Disagree', 'Agree'), 10)
dat <- data.frame(rbind(a,b), stringsAsFactors = FALSE)

I want to identify all the rows where each of the columns has the same value. For example, using dplyr mutate, I would like to create a new variable called ‘allSame’ where the value in the first row of 'dat' would be ‘yes’ and the value in the second row would be ‘no’.

I would also like to do this indexing the columns by number rather than name, because some of the variables have very long names and there are multiple sets of columns in the dataframe that I’d like to do this for.

jazzurro
  • 23,179
  • 35
  • 66
  • 76
userLL
  • 471
  • 5
  • 15
  • `dat` is a matrix, not a data frame. Which class do you intend to use? If you want to use a data frame, `dat <- data.frame(a = a, b = b, stringsAsFactors = FALSE)` is one way to create a data frame. – jazzurro Dec 03 '17 at 07:38
  • @jazzurro Thanks for pointing out that mistake, I have edited to clarify that I am working with a data frame – userLL Dec 03 '17 at 07:50
  • You have a wide format of data right now. So you have 20 columns. Is this correct? – jazzurro Dec 03 '17 at 08:05
  • Yes, the data are in wide format. 20 columns was an example, there are multiple sets of columns in the data that I would like to use this approach with. – userLL Dec 03 '17 at 08:31
  • 1
    Related: [Count the number of rows where all columns have identical values](https://stackoverflow.com/questions/45948770/count-the-number-of-rows-where-all-columns-have-identical-values) (just skip the `sum` part). – Henrik Dec 03 '17 at 10:28

2 Answers2

1

The following is one way to check if you have a same answer (i.e., all Agree or all Disagree). I created a minimal sample and did the following. You want to check if each row has "Agree" or "Disagree" only. You can use logical check. mydf == "Agree" returns a matrix with T or F. Using rowSums(), you can calculate how many times you have T in each row. If the outcome is equal to ncol(mydf), which is 3 in this case, you have "Agree" only. If you have 0, you have "Disagree" only. I guess you want these cases for yes. TRUE in allSame means yes.

mydf <- data.frame(col1 = c("Agree", "Agree", "Disagree"),
                   col2 = c("Agree", "Disagree", "Disagree"),
                   col3 = c("Agree", "Disagree", "Disagree"),
                   stringsAsFactors = FALSE)

#      col1     col2     col3
#1    Agree    Agree    Agree
#2    Agree Disagree Disagree
#3 Disagree Disagree Disagree

mydf %>%
mutate(allSame = (rowSums(mydf == "Agree") == 0 |
                  rowSums(mydf == "Agree") == ncol(mydf)))

#      col1     col2     col3 allSame
#1    Agree    Agree    Agree    TRUE
#2    Agree Disagree Disagree   FALSE
#3 Disagree Disagree Disagree    TRUE

Given the above, you would do:

dat %>%
mutate(allSame = (rowSums(dat == "Agree") == 0 |
                  rowSums(dat == "Agree") == ncol(dat)))
jazzurro
  • 23,179
  • 35
  • 66
  • 76
0

use sapply if you want to iterate through each row independently. It might be worth looking up functionals

a <- rep('Agree', 20)
b <- rep(c('Disagree', 'Agree'), 10)
df <- data.frame(a, b, stringsAsFactors =F)


df <- mutate(df, same = sapply(1:nrow(df), function(i){
  if(a[i] == b[i]){'yes'} else {'no'}
}))

renaming should use names

names(df) <- paste0('index_', 1:length(names(df))
struggles
  • 825
  • 5
  • 10