1

I want to subset a data.table or data frame by two columns using a loop. I am wondering how to do it with a nice way using purrr or dplyr.

Here is my example:

library(data.table)

# my example data
DF <- as.data.table(cbind.data.frame(x = c(100.01, 100.03, 100.09, 100.01, 233.45, 233.56, 233.51, 233.45, 561.1, 556.1, 447.23),
                    y = c(11, 10.8, 10.9, 20, 2, 2.2, 4, 6, 11, 2.1, 10.6)))

# my filter conditions
mycondition <- cbind.data.frame(c1 = c(100, 233.5), c2 = c(11, 2))

## subset DF by selecting the values in mycondition with a tolerance of 0.5
result1 <- DF[x%between%c(mycondition$c1[1] - 0.5, mycondition$c1[1] + 0.5) & y%between%c(mycondition$c2[1] - 0.5, mycondition$c2[1] + 0.5)]

result2 <- DF[x%between%c(mycondition$c1[2] - 0.5, mycondition$c1[2] + 0.5) & y%between%c(mycondition$c2[2] - 0.5, mycondition$c2[2] + 0.5)]

## combined result
result <- rbind.data.frame(result1, result2)

> result
        x    y
1: 100.01 11.0
2: 100.03 10.8
3: 100.09 10.9
4: 233.45  2.0
5: 233.56  2.2

I am wondering if there is a nice way to loop it with R package purrr or use dolyr to do it?

Thanks.

Wang
  • 1,314
  • 14
  • 21

1 Answers1

2

No loop necessary, here's a purely-data.table method.

library(data.table)
setDT(mycondition)
mycondition[, c("c1a", "c1b", "c2a", "c2b") := .(c1-0.5, c1+0.5, c2-0.5, c2+0.5)
  ][DF, on=.(c1a<=x, c1b>=x, c2a<=y, c2b>=y)
  ][!is.na(c1), .(x=c1a, y=c2a)]
#         x     y
#     <num> <num>
# 1: 100.01  11.0
# 2: 100.03  10.8
# 3: 100.09  10.9
# 4: 233.45   2.0
# 5: 233.56   2.2

FYI, your code to generate the data is unnecessarily convoluted. The following are functionally equivalent:

DF <- as.data.table(cbind.data.frame(x = c(100.01, 100.03, 100.09, 100.01, 233.45, 233.56, 233.51, 233.45, 561.1, 556.1, 447.23),
                    y = c(11, 10.8, 10.9, 20, 2, 2.2, 4, 6, 11, 2.1, 10.6)))
DF <- data.table(x = c(100.01, 100.03, 100.09, 100.01, 233.45, 233.56, 233.51, 233.45, 561.1, 556.1, 447.23),
                 y = c(11, 10.8, 10.9, 20, 2, 2.2, 4, 6, 11, 2.1, 10.6))

mycondition <- cbind.data.frame(c1 = c(100, 233.5), c2 = c(11, 2))
mycondition <- data.frame(c1 = c(100, 233.5), c2 = c(11, 2)) # though not a data.table, as I need above
mycondition <- data.table(c1 = c(100, 233.5), c2 = c(11, 2))

And since you're already using data.table, I suggest you stop using rbind.data.frame, as it is not as featureful (or perhaps safe?) as rbind.data.table, aka rbind(DF, ...). Perhaps better is data.table::rbindlist(list(result1, result2)), as it has some added functionality.

r2evans
  • 141,215
  • 6
  • 77
  • 149