Keep levels of a factor containing minimum number of levels in another factor

Question

I have a data frame like so:

 df<-data.frame(year= as.numeric(c(rep(1997, 5), rep(1998, 5), rep(1999, 5))), 
       sp= c("A", "B", "C", "D", "E", "A", "B", "C", "F", "G", "H", "I", "J","A", "B"))

I want to keep levels of sp for which there are a minimum number of unique levels in year. For this example, I want to keep sp for which there are at least 2 years of data.

I have tried this:

df<-
 df %>% 
 group_by(sp) %>% 
 filter(length(year) >= 2)

The correct output is:

 output<- data.frame( year= c("1997", "1998", "1999","1997", "1998", "1999", "1997", "1998"), 
                 sp= c("A", "A", "A", "B", "B", "B", "C", "C"))

What you have tried is the same as your expected output... they look different in terms of the row order. — Darren Tsai, Jan 27 '19 at 08:46

score 0 · Answer 1 · answered Jan 27 '19 at 08:30

You could use aggregate().

df1 <- merge(df1, aggregate(list(count=df1$year), by=list(sp=df1$sp), length))
df1 <- df1[df1$count >= 2, c(2, 1)]

Result

Data

df1 <- structure(list(year = c(1997, 1998, 1999, 1998, 1999, 1997, 1998, 
1997), sp = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), .Label = c("A", 
"B", "C", "D", "E", "F", "G", "H", "I", "J"), class = "factor")), row.names = c(NA, 
8L), class = "data.frame")

Darren Tsai · Answer 2 · 2019-01-27T08:47:58.280

0

A dplyr method:

df %>% group_by(sp) %>% filter(n() >= 2) %>% arrange(sp)

#    year sp   
#   <dbl> <fct>
# 1  1997 A    
# 2  1998 A    
# 3  1999 A    
# 4  1997 B    
# 5  1998 B    
# 6  1999 B    
# 7  1997 C    
# 8  1998 C

edited Jan 27 '19 at 08:47

answered Jan 27 '19 at 08:31

Darren Tsai

32,117
5
21
51

Keep levels of a factor containing minimum number of levels in another factor

2 Answers2