R: merge by combination of first three rows in two dataframes

Question

I've got two dataframes that look like this:

==fileA    
LmjF.01 257506  257508  1
LmjF.01 257508  257509  2
LmjF.01 257509  257510  3
LmjF.01 257510  257511  4
LmjF.01 257511  257514  5
[...]

==fileB    
LmjF.01 291121  291123  123
LmjF.01 291123  291125  122
LmjF.01 291125  291126  123
LmjF.01 291126  291128  122
LmjF.01 291128  291129  121
[...]

I would like to merge them into a single dataframe so that the first, second and third columns match in both sets, with the second and third columns being the start and end positions of the window of varying size that has the associated score in the fourth column. The fourth line is the one that I hope to have appended for each of them like so:

==fileM
LmjF.01 291121  291123  123  12
LmjF.01 291123  291125  122  43
LmjF.01 291125  291126  123  434
LmjF.01 291126  291128  122  342
LmjF.01 291128  291129  121  43
[...]

Any ideas how to do this window merging of both sets?

Related question, about joining datasets indexed by (potentially overlapping) intervals: http://stackoverflow.com/questions/8882538/merge-two-data-frames-on-non-overlapping-intervals — Vincent Zoonekynd, Feb 16 '12 at 11:54

Richie Cotton · Answer 1 · 2012-02-16T11:45:25.400

2

Here's some reproducible data:

common <- data.frame(
    id    = 1:27,
    start = seq(as.POSIXct("2012-02-01"), as.POSIXct("2012-02-27"), "days"),
    end   = seq(as.POSIXct("2012-02-03"), as.POSIXct("2012-02-29"), "days")
)
a <- common[sample(27, 15), ]
a$value <- runif(15)
b <- common[sample(27, 20), ]
b$value <- rnorm(20)

To merge the data frames you simply call merge, specifying the columns to match on.

merge(a, b, c("id", "start", "end"))

edited Feb 16 '12 at 11:45

answered Feb 16 '12 at 11:39

Richie Cotton

118,240
47
247
360

In this example, either all three columns (id, start, end) match or they all don't. You need to have a think about what behaviour you'd like if only some of the columns match. – Richie Cotton Feb 16 '12 at 11:42

score 2 · Accepted Answer · answered Feb 16 '12 at 11:39

2

i'm not sure if i understood your problem correctly but the function merge() seems to be your friend:

fileM <- merge(fileA, fileB, by.x=1:3, by.y=1:3)

answered Feb 16 '12 at 11:39

Seb

5,417
7
31
50

R: merge by combination of first three rows in two dataframes

2 Answers2