Im new to R and I have a very difficult task want to complete.
I have two set of data frame. DF1 consists of 810 observations with 4 variables, DF2 consists of 1707 observations with 51 variables. Here is some example of
DF1:
Chr POS Range_Plus_10 Range_Minus_10
2 47403201 47403211 47403191
2 47403202 47403212 47403192
2 47403210 47403220 47403200
2 47403210 47403220 47403200
2 47403210 47403220 47403200
2 47403211 47403221 47403201
DF2:
Chromosome Position
2 47630258
2 47630263
2 47630263
2 47630269
2 47630271
2 47630275
Note: not all variables are shown for df2, I am not interested in other variables, but it would be good to keep other variables in the output data.
what I want is to filter through all the positions in df2 to see if any of these positions lies within the range of df1 (within the Range_Plus_10 and Range_Minus_10 for every single row). For example, first position in df2 is 47630258 and I want to know whether this 47630258 lies within any of the range_plus_10 and Range_Minus_10 in df1 in any row, so I want R to give me an output column with all possible positions in df2 that could corresponds to every rows in df1 range.
I tried to use non equi join but I keep getting some errors and not sure where it got wrong. Could someone provide a code to obtain the data I want, and secondly tell me why my errors occur.
here is the script I've used:
library (data.table)
result <- df2[df1, . ("Chromosome", "Position"), on = .(Position < Range_Plus_10, Position >Range_Minus_10), by = .EACHI]
But I keep getting an error message:
Error in [.data.frame
(df2, df1, .("Chr", "Position", ...), on = .(Position < :
unused arguments (on = .(Position < Range_Plus_10, Position > Range_Minus_10), by = .EACHI)
Sorry for my formatting