1

I would love some help calculating the time since the temperature was as cold as it was on a particular date.

So in the example data frame below, for the first record (01/07/2000) the previous time it was as cold as this (-1) was 01/01/2000 (around 182 days before).

for the second record, (01/06/2000) the previous time it was that cold (2 degrees) was the previous month (01/05/2000) where it was actually colder (1 degree) (so around 30 days before).

df <- data.frame(date=as.Date(c("01/07/2000", "01/06/2000", "01/05/2000", 
                                "01/04/2000", "01/03/2000", "01/02/2000", 
                                "01/01/2000"), "%d/%m/%Y"), 
                 temperature =c(-1, 2, 1, 0, 1, 1, -1))

I have tried modifying this approach (Calculate days since last event in R) but found it became unwieldy when calculating for each week.

Any ideas how you might calculate the number of days since the weather was that cold, for each week? Many thanks, indeed for your help.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
threeisles
  • 301
  • 2
  • 8
  • A `data.table` non-equi join is handy here: `df[ , prev_date := df[.SD, on = .(date < date, temperature <= temperature), i.date - x.date, mult = "first"]]`. This could also easily be done by group. – Henrik Mar 08 '21 at 07:26
  • Please see my updated answer below. – jay.sf Mar 08 '21 at 08:07

3 Answers3

1

Supposed you have temperature data of different grids like this,

#          date grid temp
# 1  2000-01-01    A   -1
# 2  2000-02-01    A   -1
# 3  2000-03-01    A   -1
# ...
# 10 2000-01-01    B    2
# 11 2000-02-01    B    1
# ...

You could do a split-apply-combine approach along the grids using by. In each grid unit, we apply a Vectorized function, that calculates the difference in days since the previous occurrence of the temperature of a specific date. If there is no event before it gives NA.

f <- Vectorize(function(data, x) {
  diff(rev(with(data, date[date <= x & temp == temp[date == x]]))[2:1])
}, vectorize.args="x")
res <- do.call(rbind, by(d, d$grid, function(g) cbind(g, last=f(g, g$date))))

res
#            date grid temp last
# A.1  2000-01-01    A   -1   NA
# A.2  2000-02-01    A   -1   31
# A.3  2000-03-01    A   -1   29
# A.4  2000-04-01    A   -1   31
# A.5  2000-05-01    A    0   NA
# A.6  2000-06-01    A    2   NA
# A.7  2000-07-01    A    0   61
# A.8  2000-08-01    A    0   31
# A.9  2000-09-01    A   -1  153
# B.10 2000-01-01    B    2   NA
# B.11 2000-02-01    B    1   NA
# B.12 2000-03-01    B    2   60
# B.13 2000-04-01    B    1   60
# B.14 2000-05-01    B    2   61
# B.15 2000-06-01    B   -1   NA
# B.16 2000-07-01    B   -1   30
# B.17 2000-08-01    B    0   NA
# B.18 2000-09-01    B    2  123
# C.19 2000-01-01    C    0   NA
# C.20 2000-02-01    C    0   31
# C.21 2000-03-01    C    1   NA
# C.22 2000-04-01    C    1   31
# C.23 2000-05-01    C   -1   NA
# C.24 2000-06-01    C   -1   31
# C.25 2000-07-01    C    1   91
# C.26 2000-08-01    C    2   NA
# C.27 2000-09-01    C   -1   92

Edit

To find out when the temperature was below a specific temperature threshold temp.th we could modify the function like so:

temp.th <- 0
f2 <- Vectorize(function(data, x) {
  x - rev(with(data, date[date <= x & temp < temp.th]))[1]
}, vectorize.args="x")
res2 <- do.call(rbind, by(d, d$grid, function(g) cbind(g, last=f2(g, g$date))))

res2
#            date grid temp last
# A.1  2000-01-01    A   -1    0
# A.2  2000-02-01    A   -1    0
# A.3  2000-03-01    A   -1    0
# A.4  2000-04-01    A   -1    0
# A.5  2000-05-01    A    0   30
# A.6  2000-06-01    A    2   61
# A.7  2000-07-01    A    0   91
# A.8  2000-08-01    A    0  122
# A.9  2000-09-01    A   -1    0
# B.10 2000-01-01    B    2   NA
# B.11 2000-02-01    B    1   NA
# B.12 2000-03-01    B    2   NA
# B.13 2000-04-01    B    1   NA
# B.14 2000-05-01    B    2   NA
# B.15 2000-06-01    B   -1    0
# B.16 2000-07-01    B   -1    0
# B.17 2000-08-01    B    0   31
# B.18 2000-09-01    B    2   62
# C.19 2000-01-01    C    0   NA
# C.20 2000-02-01    C    0   NA
# C.21 2000-03-01    C    1   NA
# C.22 2000-04-01    C    1   NA
# C.23 2000-05-01    C   -1    0
# C.24 2000-06-01    C   -1    0
# C.25 2000-07-01    C    1   30
# C.26 2000-08-01    C    2   61
# C.27 2000-09-01    C   -1    0

Data:

d <- expand.grid(date=seq(as.Date("2000-01-01"), as.Date("2000-09-01"), by="month"),
            grid=LETTERS[1:3])
set.seed(42)
d$temp <- sample(-1:2, nrow(d), replace=T)
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Thanks, Jay, with a few modifications, that worked perfectly. – threeisles Mar 08 '21 at 09:28
  • require(data.table) df <- data.frame(date=as.Date(c("01/07/2000", "01/06/2000", "01/05/2000", "01/04/2000", "01/03/2000", "01/02/2000", "01/01/2000", "01/07/2000", "01/06/2000", "01/05/2000", "01/04/2000", "01/03/2000", "01/02/2000", "01/01/2000"), "%d/%m/%Y"), temperature =c(-1, 2, 1, 0, 1, 1, -1, -2, 3, 2, 0, -1, 2, -1 ), met_square = c(1,1,1,1,1,1,1, 2,2,2,2,2,2,2)) – threeisles Mar 08 '21 at 09:29
  • setDT(df) df3 <- df[order(date),] # making sure the dates are in the right order f <- Vectorize(function(data, x) { diff(rev(with(data, date[date <= x & temperature <= temperature[date == x]]))[2:1]) }, vectorize.args="x") res <- do.call(rbind, by(df3, df3$met_square, function(g) cbind(g, last=f(g, g$date)))) res – threeisles Mar 08 '21 at 09:30
  • You're welcome @threeisles thanks a lot you for letting me know! – jay.sf Mar 08 '21 at 09:35
  • Hi @jay.sf - how would you modify your function to find the time since it was last below, say, 0 degrees? I've had a go, but it's not quite correct. – threeisles Mar 08 '21 at 10:40
  • 1
    Nice follow-up @threeisles , please see edit. – jay.sf Mar 08 '21 at 11:18
1

Base R option using sapply :

c(sapply(seq(nrow(df) - 1), function(x) {
  tmp <- -(1:x)
  inds <- which(df$temperature[x] >= df$temperature[tmp])[1]
  df$date[x] - df$date[tmp][inds]
}), NA)

#[1] 182  31  30  91  29  31  NA

This assumes your data is sorted in decreasing order meaning the latest date is first (same as your example data).


To apply this by group we can turn the above code to function :

diff_days <- function(temp, date) {
  c(sapply(seq_len(length(temp) - 1), function(x) {
    tmp <- -(1:x)
    inds <- which(temp[x] >= temp[tmp])[1]
    date[x] - date[tmp][inds]
  }), NA)  
}

library(dplyr)
df %>% 
  group_by(met_square) %>% 
  mutate(result = diff_days(temperature, date)) %>%
  ungroup

#    date       temperature met_square result
#   <date>           <dbl>      <dbl>  <dbl>
# 1 2000-07-01          -1          1    182
# 2 2000-06-01           2          1     31
# 3 2000-05-01           1          1     30
# 4 2000-04-01           0          1     91
# 5 2000-03-01           1          1     29
# 6 2000-02-01           1          1     31
# 7 2000-01-01          -1          1     NA
# 8 2000-07-01          -2          2     NA
# 9 2000-06-01           3          2     31
#10 2000-05-01           2          2     30
#11 2000-04-01           0          2     31
#12 2000-03-01          -1          2     60
#13 2000-02-01           2          2     31
#14 2000-01-01          -1          2     NA
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks Ronak. That is really helpful. I can sort the data by decreasing date. I do have one other issue, which I didn't mention above. I have data for many hundreds of meterological grid squares. How would you change your code to cope with mutiple grids squares, so you are only looking within the same grid cell? – threeisles Mar 08 '21 at 07:27
  • df <- data.frame(date=as.Date(c("01/07/2000", "01/06/2000", "01/05/2000", "01/04/2000", "01/03/2000", "01/02/2000", "01/01/2000", "01/07/2000", "01/06/2000", "01/05/2000", "01/04/2000", "01/03/2000", "01/02/2000", "01/01/2000"), "%d/%m/%Y"), temperature =c(-1, 2, 1, 0, 1, 1, -1, -2, 3, 2, 0, -1, 2, -1 ), met_square = c(1,1,1,1,1,1,1, 2,2,2,2,2,2,2)) – threeisles Mar 08 '21 at 07:28
0

Here is the working code, based on Jay's answer above

require(data.table)


df <- data.frame(date=as.Date(c("01/07/2000", "01/06/2000", "01/05/2000", "01/04/2000", "01/03/2000", "01/02/2000", "01/01/2000", "01/07/2000", "01/06/2000", "01/05/2000", "01/04/2000", "01/03/2000", "01/02/2000", "01/01/2000"), "%d/%m/%Y"), 
                 temperature =c(-1, 2, 1, 0, 1, 1, -1, -2, 3, 2, 0, -1, 2, -1 ), 
                 met_square = c(1,1,1,1,1,1,1, 2,2,2,2,2,2,2))



setDT(df)

df3 <- df[order(date),]  # making sure the dates are in the right order



f <- Vectorize(function(data, x) {
  diff(rev(with(data, date[date <= x & temperature <= temperature[date == x]]))[2:1])
}, vectorize.args="x")



res <- do.call(rbind, by(df3, df3$met_square, function(g) cbind(g, last=f(g, g$date))))

res
threeisles
  • 301
  • 2
  • 8