0

I have two data frames : Laser and Chamber :

>Laser
      HeiferID            DateHour MeasurePPM
    8474 2016-12-18 15:17:22         49
    8474 2016-12-18 15:17:22         63
    8474 2016-12-18 15:17:23         18
    8474 2016-12-18 15:17:23         83
    8474 2016-12-18 15:17:24        108
    8474 2016-12-18 15:17:24        145

.....

and

      HeiferID            DateHour Chamberppm
     8474 2016-12-18 15:07:24       84.6
     8474 2016-12-18 15:11:09       87.0
     8474 2016-12-18 15:14:54       87.5
     8474 2016-12-18 15:18:39       93.2
     8474 2016-12-18 15:22:24       90.6
     8474 2016-12-18 15:26:09       89.5

...

(That's just an exemple, there're 93184 rows in Laser and 17266 in Chamber).

So for example, here the nearest Chamber$DateHour is 15:18:39 and the corresponding value is 93.2. So I would like to add 93.2 to all the rows that match 15:18:39 as the nearest DateHour.

I would like to do that for all the Laser$DateHour : find the nearest Chamber$DateHour and add the Chamberppm corresponding. So the condition will be : HeiferID (DateHour is not unique for Chamber) and the nearest DateHour between the two dataset (I have trouble to find how write this). It's easy to add the column when the date are equal :

> LaserChamber <- inner_join(Chamber,Laser, by=c("HeiferID","DateHour"))

But when it's the closest date, I can't write it. Normally I will end up with the same value of Chamberppm for several rows in the Laser data. That's normal. (In the chamber data, measurement were taken every 4min, in the laser one, every 0,5s !)

I tried to use the package data.table or dplyr but it didn't work at all.

Any idea ? Thanks for your help

***Update : I've tried this following the comments :

> LaserChamber1 <- Laser1[Chamber,roll = "nearest", on=c("HeiferID")]
Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
  Attempting roll join on factor column x.HeiferID. Only integer, double or character colums may be roll joined.
Marie
  • 45
  • 1
  • 10
  • 1
    Maybe something like `DT1[DT2, on=cols, roll="nearest", v := i.v]` http://stackoverflow.com/q/33346743/ – Frank Jan 20 '17 at 19:47
  • I'm gonna try that thanks. What is the "on=cols" for ? – Marie Jan 20 '17 at 20:03
  • I've tested something, I put it on the post – Marie Jan 20 '17 at 20:14
  • I meant `on=cols` as pseudocode. In place of cols, you'll want something like what you have for `by=` in your `inner_join`, so `on=c("HeiferID", "DateHour")`, I think. I haven't tested with your example because it's hard to read in. You can read more about how rolling joins work in the docs by typing `?data.table`. – Frank Jan 20 '17 at 21:10
  • I've tried to put "on=c("HeiferID") but I got this message as an error "Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch, : Only integer, double or character colums may be roll joined." Does it mean I have to change the HeiferID's class ? – Marie Jan 21 '17 at 09:49
  • I was suggesting `on=c("HeiferID", "DateHour")`. It tries to "roll" on the final column in the `on=`. It doesn't like rolling on a categorical variable / factor like HeiferID because it is not in a metric space (so there is no sense of "distance" between two IDs and so no sense of a "nearest" ID to another). So you should keep the class but change the `on=` vector. – Frank Jan 21 '17 at 14:44
  • Thanks. Finally I cut the data into two files in a way to have unique dates. thank you for your help – Marie Jan 22 '17 at 21:28

0 Answers0