1

I have a huge dataframe, but here is a very simplified example:

df <- data.frame(Id=c(rep("Mike",8)), Year=c(rep("2015",2),rep("2016",3),
             rep("2017",3)),location=c(rep("A",2),rep("B",3),"D","E","E"))
df
    #  Id   Year location
    #1 Mike 2015        A
    #2 Mike 2015        A
    #3 Mike 2016        B
    #4 Mike 2016        B
    #5 Mike 2016        B
    #6 Mike 2017        D
    #7 Mike 2017        E
    #8 Mike 2017        E

My grouping criteria is Id and Year, so for an specific group (e.g., Mike 2017) there are many rows. I want to remove all rows of a group in which "location" factors are not all equal.

In this case the only group in which not all locations are the same is "Mike 2017". Then, I want to end up with a dataframe like this:

#    Id Year location
#1 Mike 2015        A
#2 Mike 2015        A
#3 Mike 2016        B
#4 Mike 2016        B
#5 Mike 2016        B

Is there a way to do this by indicating the grouping criteria and the exclusion criteria described above?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • This almost a duplicate of https://stackoverflow.com/questions/31649049/select-groups-with-more-than-one-distinct-value – Ronak Shah Jul 30 '18 at 10:14

1 Answers1

0

We can group_by Id and Year and select only those rows which has only one unique value per group

library(dplyr)
df %>%
   group_by(Id, Year) %>%
   filter(n_distinct(location) == 1) %>%
   #To remove grouping from @AntoniosK in the comments
   ungroup()

#    Id  Year  location
#  <fct> <fct> <fct>   
#1 Mike  2015  A       
#2 Mike  2015  A       
#3 Mike  2016  B       
#4 Mike  2016  B       
#5 Mike  2016  B     

The base R version using ave would be

df[with(df, ave(location, Id, Year, FUN = function(x) length(unique(x)))) == 1, ]

As @AntoniosK mentioned, make sure to convert location to character using as.character before using ave option. If needed you can convert them back to factor.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • The `dplyr` solution will keep the grouping if you don't use `ungroup()` and the base R solution seems not to work with factor variables, but with character variables. – AntoniosK Jul 30 '18 at 10:04