This is a rather general question, not a particular one on an existing example.
I have a very large data.table
object, 130 million rows. If I want to filter rows based on an OR condition ("|"), it takes forever, unlike the AND condition, that is executed in no time, but a code like
df[start %chin% stops$names | end %chin% stops$names]
is impossible to wait for. I must be missing something since data.table ought to work very fast. Thank you in advance.
An example data is the following:
df <- setDT(tibble(start=c("stop1", "stop2", "stop3", "stop1", "stop4"),
end=c("stop13", "stop2", "stop15", "stop2", "stop3"),
via=c("\\stop3\\", "-", "-", "\\stop4, stop5\\", "-"),
date=c(2022-01-13, 2022-04-05, 2022-03-04, 2022-04-03, 2022-01-18)))
stops <- setDT(tibble(names=c("stop5", "stop6", "stop7", "stop10"),
line=c("1", "23", "450a", "2")))