I have two data.tables that provide sequence coordinates across different chromosomes (categories). For example:
library(data.table)
dt1 <- data.table(chromosome = c("1", "1", "1", "1", "X"),
start = c(1, 50, 110, 150, 110),
end = c(11, 100, 121, 200, 200))
dt2 <- data.table(chromosome = c("1", "1", "X"),
start = c(12, 60, 50),
end = c(20, 115, 80))
I need to create a third data.table that provides coordinates for sequences that contain all the integers in dt1 that do not overlap with any integers from the sequences in dt2. For example:
dt3 <- data.table(chromosome = c("1", "1", "1", "1", "X"),
start = c(1, 50, 116, 150, 110),
end = c(11, 59, 121, 200, 200))
The data.tables I need to run this on are very large and therefore I need to maximise performance. I have tried using the foverlaps() function but to no avail. Any help would be greatly appreciated!