0

I have a data frame like so:

 df<-data.frame(year= as.numeric(c(rep(1997, 5), rep(1998, 5), rep(1999, 5))), 
       sp= c("A", "B", "C", "D", "E", "A", "B", "C", "F", "G", "H", "I", "J","A", "B"))

I want to keep levels of sp for which there are a minimum number of unique levels in year. For this example, I want to keep sp for which there are at least 2 years of data.

I have tried this:

df<-
 df %>% 
 group_by(sp) %>% 
 filter(length(year) >= 2)

The correct output is:

 output<- data.frame( year= c("1997", "1998", "1999","1997", "1998", "1999", "1997", "1998"), 
                 sp= c("A", "A", "A", "B", "B", "B", "C", "C"))
Danielle
  • 785
  • 7
  • 15

2 Answers2

0

You could use aggregate().

df1 <- merge(df1, aggregate(list(count=df1$year), by=list(sp=df1$sp), length))
df1 <- df1[df1$count >= 2, c(2, 1)]

Result

> df1
  year sp
1 1997  A
2 1998  A
3 1999  A
4 1998  B
5 1999  B
6 1997  B
7 1998  C
8 1997  C

Data

df1 <- structure(list(year = c(1997, 1998, 1999, 1998, 1999, 1997, 1998, 
1997), sp = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), .Label = c("A", 
"B", "C", "D", "E", "F", "G", "H", "I", "J"), class = "factor")), row.names = c(NA, 
8L), class = "data.frame")
jay.sf
  • 60,139
  • 8
  • 53
  • 110
0

A dplyr method:

df %>% group_by(sp) %>% filter(n() >= 2) %>% arrange(sp)

#    year sp   
#   <dbl> <fct>
# 1  1997 A    
# 2  1998 A    
# 3  1999 A    
# 4  1997 B    
# 5  1998 B    
# 6  1999 B    
# 7  1997 C    
# 8  1998 C
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51