2

I have a dataframe with groups of Sequences:

df <- data.frame(
  ID = letters[1:13],
  Sequ = c(NA,1,1,1,2,3,3,3,NA,NA,4,4,4,4)
)

I want to filter groups that have more than a critical number n of members; let's suppose that critical number n is 3. This attempt only selects the 4th member row but not the Sequence as a whole:

df %>%
  group_by(Sequ) %>%
  filter(row_number() > 3)
# A tibble: 1 × 2
# Groups:   Sequ [1]
  ID     Sequ
  <chr> <dbl>
1 n         4

So how can I get this desired output, ideally with 'dplyr` but other solutions are welcome as well:

df
  ID Sequ
1  k    4
2  l    4
3  m    4
4  n    4
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34

2 Answers2

3

You can use the following code that first removes NA and then group_by the Sequ and filter groups with more than 3 members:

df <- data.frame(
  ID = letters[1:14],
  Sequ = c(NA,1,1,1,2,3,3,3,NA,NA,4,4,4, 4)
)
library(dplyr)
df %>%
  na.omit() %>%
  group_by(Sequ) %>%
  filter(n() > 3)
#> # A tibble: 4 × 2
#> # Groups:   Sequ [1]
#>   ID     Sequ
#>   <chr> <dbl>
#> 1 k         4
#> 2 l         4
#> 3 m         4
#> 4 n         4

Created on 2022-07-31 by the reprex package (v2.0.1)

Old answer

You can use the following code:

df <- data.frame(
  ID = letters[1:14],
  Sequ = c(NA,1,1,1,2,3,3,3,NA,NA,4,4,4, 4)
)
library(dplyr)
df %>%
  group_by(Sequ) %>%
  filter(Sequ > 3)
#> # A tibble: 4 × 2
#> # Groups:   Sequ [1]
#>   ID     Sequ
#>   <chr> <dbl>
#> 1 k         4
#> 2 l         4
#> 3 m         4
#> 4 n         4

Created on 2022-07-31 by the reprex package (v2.0.1)

Quinten
  • 35,235
  • 5
  • 20
  • 53
1

Here is a compact option with data.table

> library(data.table)
> setDT(df)[,.SD[.N>3],Sequ]
   Sequ ID
1:    4  k
2:    4  l
3:    4  m
4:    4  n
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81