Filtering rows where all columns contain the same data in R

Question

I have a dataset where each participant was asked to respond to a series of questions; some gave the same response to every question - I want to discard of these participants' data but not sure how to go about doing that.

I mean I could try filter/distinct but the problem I'm having is that there are approx 100 questions (so 100 columns) and I'm not sure what the syntax is to make R filter all of them.

Essentially my dataset looks like this

id q1 q2 q3 q4 q5 
1  7  4  2  3  5
2  5  7  2  6  1
3  1  1  1  1  1
4  4  7  8  2  3

And after tidying it up, it should look like this

id q1 q2 q3 q4 q5 
1  7  4  2  3  5
2  5  7  2  6  1
4  4  7  8  2  3

Any help is much appreciated!

non-tibbliverse approach could be `df[rowSums(df[-1] == df[,2]) != ncol(df[-1]), ]` — David Arenburg, Oct 24 '17 at 09:19

Paul Hiemstra · Answer 1 · 2017-10-24T09:27:03.583

2

You can check if all their answers where the same using:

all_the_same = apply(df, 1, function(x) all(x == x[1]))

and then simply throwing those away:

df[!all_the_same,]

or using dplyr:

df %>% filter(!all_the_same)

edited Oct 24 '17 at 09:27

answered Oct 24 '17 at 09:19

Paul Hiemstra

59,984
12
142
149

1

See also [this SO answer](https://stackoverflow.com/questions/4752275/test-for-equality-among-all-elements-of-a-single-vector) for alternative options for determining if a vector has all the same values. For example `length(unique(x)) == 1`. – Paul Hiemstra Oct 24 '17 at 09:28

score 0 · Answer 2 · answered Oct 24 '17 at 09:53

0

Here is one approach with paste and gsub

df[nchar(gsub("(.)\\1+", "", do.call(paste0, df[-1])))>0,]
#   id q1 q2 q3 q4 q5
#1  1  7  4  2  3  5
#2  2  5  7  2  6  1
#4  4  4  7  8  2  3

answered Oct 24 '17 at 09:53

akrun

874,273
37
540
662

Filtering rows where all columns contain the same data in R

2 Answers2

Linked