1

How can I keep those rows in one data frame (df1) that I have identified in a second data frame (keep_sites)? In the example dataset below, I have three data variables (Data1,Data2,Data3) associated with four different sites (Site). I would like to keep all the rows in df1 for only those sites in the keep_sites data frame.

Example Dataset:

df1 <- data.frame(matrix(ncol = 4, nrow = 12))
x <- c("Site","Data1","Data2","Data3")
colnames(df1) <- x
df1$Site <- rep(c("A","B","C","D"),3)
set.seed(99)
df1$Data1 <- rnorm(12,4,1)
df1$Data2 <- rnorm(12,16,2)
df1$Data3 <- rnorm(12,32,4)
df1[order(df1$Site, decreasing = FALSE),]

keep_sites <- data.frame(matrix(ncol = 1, nrow = 2))
y <- "Site"
colnames(keep_sites)[1] <- y
keep_sites[1,1] <- "A"
keep_sites[2,1] <- "C"

I have tried this but it only returns those rows associated with the first site (site A) in keep_sites:

df2 <- df1[df1$Site == keep_sites$Site,]

The correct output should look like this:

   Site    Data1     Data2    Data3
1     A 4.213963 17.500109 32.89067
5     A 3.637162 15.211962 26.53026
9     A 3.635883 18.197843 31.41482
3     C 4.087829  9.918132 34.73457
7     C 3.136155 16.997263 37.49222
11    C 3.254231 15.881167 22.82112
tassones
  • 891
  • 5
  • 18

1 Answers1

1

We need %in% instead of ==

out <- df1[df1$Site %in% keep_sites$Site,]
out <- out[order(out$Site),]

or with subset

subset(df1, Site %in% keep_sites$Site)
akrun
  • 874,273
  • 37
  • 540
  • 662