0

I have a large dataset over many years which has several variables, but the one I am interested in is wind speed and dateTime. I want to find the time of the max wind speed for every day in the data set. I have hourly data in Posixct format, with WS as a numeric with occasional NAs. Below is a short data set that should hopefully illustrate my point, however my dateTime wasn't working out to be hourly data, but it provides enough for a sample.

dateTime <- seq(as.POSIXct("2011-01-01 00:00:00", tz = "GMT"),
            as.POSIXct("2011-01-29 23:00:00", tz = "GMT"),
            by = 60*24)
WS <- sample(0:20,1798,rep=TRUE)
WD <- sample(0:390,1798,rep=TRUE)
Temp <- sample(0:40,1798,rep=TRUE)
df <- data.frame(dateTime,WS,WD,Temp)
df$WS[WS>15] <- NA

I have previously tried creating a new column with just a posix date (minus time) to allow for day isolation, however all the things I have tried have only returned a shortened data frame with date and WS (aggregate, splitting, xts). Aggregate was only one that didn't do this, however, it gave me 23:00:00 as a constant time which isn't correct.

I have looked at How to calculate daily means, medians, from weather variables data collected hourly in R?, https://stats.stackexchange.com/questions/7268/how-to-aggregate-by-minute-data-for-a-week-into-hourly-means and others but none have answered this question, or the solutions have not returned an ideal result.

I need to compare the results of this analysis with another data frame, so hence the reason I need the actual time when the max wind speed occurred for each day in the dataset. I have a feeling there is a simple solution, however, this has me frustrated.

Dee
  • 15
  • 4

3 Answers3

2

A dplyr solution may be:

library(dplyr)

df %>%
  mutate(date = as.Date(dateTime)) %>%
  left_join(
    df %>%
      mutate(date = as.Date(dateTime)) %>%
      group_by(date) %>%
      summarise(max_ws = max(WS, na.rm = TRUE)) %>%
      ungroup(),
    by = "date"
  ) %>%
  select(-date)

#                 dateTime WS  WD Temp max_ws
# 1    2011-01-01 00:00:00 NA 313    2     15
# 2    2011-01-01 00:24:00  7 376    1     15
# 3    2011-01-01 00:48:00  3  28   28     15
# 4    2011-01-01 01:12:00 15 262   24     15
# 5    2011-01-01 01:36:00  1 149   34     15
# 6    2011-01-01 02:00:00  4 319   33     15
# 7    2011-01-01 02:24:00 15 280   22     15
# 8    2011-01-01 02:48:00 NA 110   23     15
# 9    2011-01-01 03:12:00 12  93   15     15
# 10   2011-01-01 03:36:00  3   5    0     15
Kevin Arseneau
  • 6,186
  • 1
  • 21
  • 40
  • Hi @Kevin Arseneau I have tried it on the sample data and gotten an error `by` can't contain join column `date` which is missing from RHS. Is there a fix for this? – Dee Sep 23 '17 at 04:28
  • @Dee, check the code you are running. When I run the code from your question immediately followed by mine there is no error. I think you must be missing something, my `mutate` statement inside the `left_join` creates the `date` column for the right hand side (RHS). – Kevin Arseneau Sep 23 '17 at 04:34
  • Hi @Kevin Arseneau. I closed R and restarted. It works. If all else fails restarting helps... :) Thanks! – Dee Sep 23 '17 at 05:29
1

Dee asked for: "I want to find the time of the max wind speed for every day in the data set." Other answers have calculated the max(WS) for every day, but not at which hour that occured.

So I propose the following solution with dyplr:

library(dplyr)
set.seed(12345)
dateTime <- seq(as.POSIXct("2011-01-01 00:00:00", tz = "GMT"),
                as.POSIXct("2011-01-29 23:00:00", tz = "GMT"),
                by = 60*24)
WS <- sample(0:20,1738,rep=TRUE)
WD <- sample(0:390,1738,rep=TRUE)
Temp <- sample(0:40,1738,rep=TRUE)
df <- data.frame(dateTime,WS,WD,Temp)
df$WS[WS>15] <- NA

df %>% 
  group_by(Date = as.Date(dateTime)) %>% 
  mutate(Hour = hour(dateTime),
         Hour_with_max_ws = Hour[which.max(WS)])

enter image description here

I want to highlight out, that if there are several hours with the same maximal windspeed (in the example below: 15), only the first hour with max(WS) will be shown as result, though the windspeed 15 was reached on that date at the hours 0, 3, 4, 21 and 22! So you might need a more specific logic.

enter image description here

Samuel Reuther
  • 101
  • 1
  • 3
  • Thanks @Samuel. I am hoping because my observational data has been converted from km h-1 to m s-1, it will be unique enough for this exercise – Dee Sep 25 '17 at 01:28
0

For the sake of completeness (and because I like the concise code) here is a "one-liner" using data.table:

library(data.table)
setDT(df)[, max.ws := max(WS, na.rm = TRUE), by = as.IDate(dateTime)][]
                 dateTime WS  WD Temp max.ws
   1: 2011-01-01 00:00:00 NA 293   22     15
   2: 2011-01-01 00:24:00 15  55   14     15
   3: 2011-01-01 00:48:00 NA 186   24     15
   4: 2011-01-01 01:12:00  4 300   22     15
   5: 2011-01-01 01:36:00  0 120   36     15
  ---                                       
1734: 2011-01-29 21:12:00 12 249    5     15
1735: 2011-01-29 21:36:00  9 282   21     15
1736: 2011-01-29 22:00:00 12 238    6     15
1737: 2011-01-29 22:24:00 10 127   21     15
1738: 2011-01-29 22:48:00 13 297    0     15
Uwe
  • 41,420
  • 11
  • 90
  • 134
  • thanks for the short code. It looks very elegant. I will try it on my data and see what the outcome is. – Dee Sep 25 '17 at 01:31