Look up and extract values exceeding a threshold in r

Question

I have two data frames:

#df1
df1 = data.frame(id = c("A","B","C","D","E"), 
                 dev = c(213.5, 225.1, 198.9, 201.0, 266.8))
df1
   id   dev
1  A 213.5
2  B 225.1
3  C 198.9
4  D 201.0
5  E 266.8   

#df2
df2 = data.frame(DateTime = seq(
  from = as.POSIXct("1986-1-1 0:00"),
  to = as.POSIXct("1986-1-2 23:00"), 
  by = "hour"), 
  cum_dd = seq(from = 185, to = 295, by = 2.3)) 
head(df2) 
             DateTime cum_dd
1 1986-01-01 00:00:00  185.0
2 1986-01-01 01:00:00  187.3
3 1986-01-01 02:00:00  189.6
4 1986-01-01 03:00:00  191.9
5 1986-01-01 04:00:00  194.2
6 1986-01-01 05:00:00  196.5

I would like to make a new column in df1 listing the earliest df2$DateTime at which df2$cum_dd exceeds df1$dev.

Here is my desired result:

  id   dev             desired
1  A 213.5 1986-01-01 13:00:00
2  B 225.1 1986-01-01 18:00:00
3  C 198.9 1986-01-01 07:00:00
4  D 201.0 1986-01-01 07:00:00
5  E 266.8 1986-01-02 12:00:00

I am familiar with the min(which()) function in dplyr, which, when formatted as follows returns the first row number in df2 at which cum_dd is above 200:

library(dplyr)
min(which (df2$cum_dd > 200))

In effect I want to run this function for each row in df1 (replacing the "200" with df1$dev), and look up/extract the corresponding df2$DateTime value rather than the row number.

I thought I was getting close with this, but it's not quite right, and I cannot find a comparable problem in Stack Overflow:

desired <- apply(df1, 1, 
           function (x) {ddply(df2, .(DateTime), summarize, 
           min(which (df2$cum_dd > df1$dev)))})

Thank you so much if you have a solution!

AntoniosK · Answer 1 · 2018-08-24T21:12:34.443

3

# example datasets
df1 = data.frame(id = c("A","B","C","D","E"), 
                 dev = c(213.5, 225.1, 198.9, 201.0, 266.8))

df2 = data.frame(DateTime = seq(
  from = as.POSIXct("1986-1-1 0:00"),
  to = as.POSIXct("1986-1-2 23:00"), 
  by = "hour"), 
  cum_dd = seq(from = 185, to = 295, by = 2.3)) 

library(tidyverse)

df1 %>% 
  crossing(df2) %>%         # get all combinations of rows
  group_by(id, dev) %>%     # for each id and dev
  summarise(desired = min(DateTime[cum_dd > dev])) %>%  # get minimum date when cumm_dd exeeds dev
  ungroup()                 # forget the grouping

# # A tibble: 5 x 3
#   id      dev desired            
#   <fct> <dbl> <dttm>             
# 1 A      214. 1986-01-01 13:00:00
# 2 B      225. 1986-01-01 18:00:00
# 3 C      199. 1986-01-01 07:00:00
# 4 D      201  1986-01-01 07:00:00
# 5 E      267. 1986-01-02 12:00:00

edited Aug 24 '18 at 21:12

answered Aug 24 '18 at 20:59

AntoniosK

15,991
2
19
32

Brilliant, thank you @AntoniosK! Two questions for you: #1) How can I bind the newly created "desired" column onto df1? #2) What exactly is happening with crossing(df2)? – Monte Aug 24 '18 at 21:45
`crossing` is creating all combinations between rows of `df1` and `df2`. More info here `?crossing`. The new dataset is same as `df1` plus the new column. So, you can simply do `df1 = df1 %>% crossing(df2) %>% ...` and this new dataset will be saved as `df1`. – AntoniosK Aug 24 '18 at 21:53
Got it, thank you. I noticed you edited the code somewhat, and now I'm no longer able to reproduce what you have posted. Now, the code only returns a single row and single column. I'm trying to retrace my steps to see what went wrong... – Monte Aug 24 '18 at 22:08
My last edit was 56' ago! Maybe you updated your `df1` and then you tried to run the process from the beginning using (the updated) `df1`? Try to start a fresh R session and run the example above to make sure it runs correctly. – AntoniosK Aug 24 '18 at 22:11

mkeskisa · Answer 2 · 2018-08-24T21:06:21.410

library(tidyverse)
df1 = data.frame("id" = c("A","B","C","D","E"), "dev" = c(213.5, 225.1, 198.9, 201.0, 266.8))

df2 = data.frame("DateTime" = seq(
  from = as.POSIXct("1986-1-1 0:00"),
  to = as.POSIXct("1986-1-2 23:00"), 
  by = "hour"), 
  "cum_dd" = seq(from = 185, to = 295, by = 2.3)) 

df2 %>% 
  crossing(df1) %>% 
  filter(cum_dd > dev) %>% 
  arrange(DateTime, desc(cum_dd)) %>% 
  group_by(id) %>% 
  distinct(id, .keep_all = T)

Look up and extract values exceeding a threshold in r

2 Answers2