I have two data frames:
#df1
df1 = data.frame(id = c("A","B","C","D","E"),
dev = c(213.5, 225.1, 198.9, 201.0, 266.8))
df1
id dev
1 A 213.5
2 B 225.1
3 C 198.9
4 D 201.0
5 E 266.8
#df2
df2 = data.frame(DateTime = seq(
from = as.POSIXct("1986-1-1 0:00"),
to = as.POSIXct("1986-1-2 23:00"),
by = "hour"),
cum_dd = seq(from = 185, to = 295, by = 2.3))
head(df2)
DateTime cum_dd
1 1986-01-01 00:00:00 185.0
2 1986-01-01 01:00:00 187.3
3 1986-01-01 02:00:00 189.6
4 1986-01-01 03:00:00 191.9
5 1986-01-01 04:00:00 194.2
6 1986-01-01 05:00:00 196.5
I would like to make a new column in df1 listing the earliest df2$DateTime at which df2$cum_dd exceeds df1$dev.
Here is my desired result:
id dev desired
1 A 213.5 1986-01-01 13:00:00
2 B 225.1 1986-01-01 18:00:00
3 C 198.9 1986-01-01 07:00:00
4 D 201.0 1986-01-01 07:00:00
5 E 266.8 1986-01-02 12:00:00
I am familiar with the min(which()) function in dplyr, which, when formatted as follows returns the first row number in df2 at which cum_dd is above 200:
library(dplyr)
min(which (df2$cum_dd > 200))
In effect I want to run this function for each row in df1 (replacing the "200" with df1$dev), and look up/extract the corresponding df2$DateTime value rather than the row number.
I thought I was getting close with this, but it's not quite right, and I cannot find a comparable problem in Stack Overflow:
desired <- apply(df1, 1,
function (x) {ddply(df2, .(DateTime), summarize,
min(which (df2$cum_dd > df1$dev)))})
Thank you so much if you have a solution!