2

I have a dataset where each participant was asked to respond to a series of questions; some gave the same response to every question - I want to discard of these participants' data but not sure how to go about doing that.

I mean I could try filter/distinct but the problem I'm having is that there are approx 100 questions (so 100 columns) and I'm not sure what the syntax is to make R filter all of them.

Essentially my dataset looks like this

id q1 q2 q3 q4 q5 
1  7  4  2  3  5
2  5  7  2  6  1
3  1  1  1  1  1
4  4  7  8  2  3

And after tidying it up, it should look like this

id q1 q2 q3 q4 q5 
1  7  4  2  3  5
2  5  7  2  6  1
4  4  7  8  2  3

Any help is much appreciated!

David Arenburg
  • 91,361
  • 17
  • 137
  • 196

2 Answers2

2

You can check if all their answers where the same using:

all_the_same = apply(df, 1, function(x) all(x == x[1]))

and then simply throwing those away:

df[!all_the_same,]

or using dplyr:

df %>% filter(!all_the_same)
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • 1
    See also [this SO answer](https://stackoverflow.com/questions/4752275/test-for-equality-among-all-elements-of-a-single-vector) for alternative options for determining if a vector has all the same values. For example `length(unique(x)) == 1`. – Paul Hiemstra Oct 24 '17 at 09:28
0

Here is one approach with paste and gsub

df[nchar(gsub("(.)\\1+", "", do.call(paste0, df[-1])))>0,]
#   id q1 q2 q3 q4 q5
#1  1  7  4  2  3  5
#2  2  5  7  2  6  1
#4  4  4  7  8  2  3
akrun
  • 874,273
  • 37
  • 540
  • 662