I have one data frame with several columns. I want to select only the rows that have especific information (species from one specific genus) that appear at least three times in the whole data frame and group by column "code". In the species columns (spp) I have the species name for the interested species and "NA" for all others species that are not my interest here. I need to count the number that each species of the genus appear and exclude the other sites (code column) that only have species that appear less than 3 times in the whole dataset.
df <- data.frame(
spp = c("sp1", "sp1", "sp1","NA", "NA", "sp3", "sp3", "sp3", "sp3", "NA", "NA",
"NA", "NA", "NA"),
code = c("a", "b", "c","a", "e", "d", "a",
"b", "c", "f", "b","a","b","c"),
va1 = c(1, 2, 2, 2,4, 3, 3, 4, 5, 5, 5,6,7,8)
)
spp code va1
1 sp1 a 1
2 sp1 b 2
3 sp1 c 2
4 NA a 2
5 NA e 4
6 sp3 d 3
7 sp3 a 3
8 sp3 b 4
9 sp3 c 5
10 sp4 f 5
11 NA b 5
12 NA a 6
13 NA b 7
14 NA c 8
filtered_df <- df %>%
group_by(code) %>%
filter(n() >= 3)
I'm trying to estipulate to not to count NAs, but is not working.
I'm expecting (below) that only appear the species and the code sites that have species that occur at leat 3 times. So it would be only the spp that appear 3 or more times and the sites that have these species (the site that have only species that appear less than 3 times should be excluded from the dataset).
spp code va1
1 sp1 a 1
2 sp1 b 2
3 sp1 c 2
4 NA a 2
5 sp3 d 4
6 sp3 a 3
7 sp3 b 3
8 sp3 c 3
9 NA b 4
10 NA a 5
11 NA b 6
12 NA c 7
The codes e and f were excluded, because the species of interest did not appear at least 3 times.