0

Im new to R and I have a very difficult task want to complete.

I have two set of data frame. DF1 consists of 810 observations with 4 variables, DF2 consists of 1707 observations with 51 variables. Here is some example of

DF1:

Chr POS Range_Plus_10 Range_Minus_10

2 47403201 47403211 47403191

2 47403202 47403212 47403192

2 47403210 47403220 47403200

2 47403210 47403220 47403200

2 47403210 47403220 47403200

2 47403211 47403221 47403201

DF2:

Chromosome Position
2 47630258
2 47630263
2 47630263
2 47630269
2 47630271
2 47630275
Note: not all variables are shown for df2, I am not interested in other variables, but it would be good to keep other variables in the output data.

what I want is to filter through all the positions in df2 to see if any of these positions lies within the range of df1 (within the Range_Plus_10 and Range_Minus_10 for every single row). For example, first position in df2 is 47630258 and I want to know whether this 47630258 lies within any of the range_plus_10 and Range_Minus_10 in df1 in any row, so I want R to give me an output column with all possible positions in df2 that could corresponds to every rows in df1 range.

I tried to use non equi join but I keep getting some errors and not sure where it got wrong. Could someone provide a code to obtain the data I want, and secondly tell me why my errors occur.

here is the script I've used:

library (data.table)

result <- df2[df1, . ("Chromosome", "Position"), on = .(Position < Range_Plus_10, Position >Range_Minus_10), by = .EACHI]

But I keep getting an error message: Error in [.data.frame(df2, df1, .("Chr", "Position", ...), on = .(Position < : unused arguments (on = .(Position < Range_Plus_10, Position > Range_Minus_10), by = .EACHI)

Sorry for my formatting

0 Answers0