0

Long-time reader, first-time asker here :)

I have some data collected at specific times and dates, and there is reason to hypothesize the data roughly follows a 24-hour cycle. I would like to fit a sine wave model on my data as a function of time, so that it is possible to test if future data points fall on the predicted pattern.

I have read this, this and this response but they are not solving my problem because in my case, I'm hoping to keep the x-axis data in POSIXct date-time format. That's how the data is collected and using this format makes for an easily interpreted plot.

Here's some reproducible data that is identical to my real data:

time <- c("2022-01-01 09:20:00", "2022-01-02 11:10:00", 
          "2022-01-02 18:37:00", "2022-01-03 14:01:00", 
          "2022-01-05 06:50:00", "2022-01-06 17:03:00")

time <- as.POSIXct(time)

value <- c(3, 6, 2, 8, 4, 1)

These are plotted fine in base R:

plot(time, value)

However, now I run into trouble when I try to construct a sine regression model that would fit the time series. I'm also struggling to fully understand the parameters required by nls function. Based on the previous examples, I have tried this approach (with comments on how I understand it working):

res <- nls(value ~ A * sin(omega * time + phi) + C,        # This is the basic sine-function format
           data = data.frame(time, value),                 # This defines the data used
           start = list(A = 1, omega = 1, phi = 1, C = 1)) # This gives nls the starting values?

Here, I get an error message: "Error in Ops.POSIXt(omega, time) : '*' not defined for "POSIXt" objects" which I interpret as meaning the specific date format I would like to use is not acceptable for this type of approach. I know this, because if I simply replace the time variable with a dummy vector of integers, the model works fine and I'm able to plot it as follows:

time2 <- c(1, 2, 3, 4, 5, 6)
res <- nls(value ~ A * sin(omega * time2 + phi) + C,
       data = data.frame(time, value),
       start=list(A=1, omega=1, phi=1, C=1))

coefs <- coef(res)

fit <- function(x, a, b, c, d) {a * sin(b * x + c) + d}

plot(time2, value)
curve(fit(x, a = coefs["A"], b = coefs["omega"], 
             c = coefs["phi"], d = coefs["C"]), add=TRUE, 
             lwd=2, col="red")

I know I'm on the right track but my main question is, how can I do the above process while maintaining the time variable in POSIXct format?

As mentioned, my main order of business would be to plot the data using Ggplot2, but I can't even begin to try that before I solve the initial problem. However, any pointers on how to get started with that are greatly appreciated! :)

1 Answers1

0

I would probably just generate a numeric number of days from an arbitrary origin time and use that. You can then modify your fit function so that it converts date-times to predicted values. You can then easily make a data frame of predictions from your model and plot that.

df <- data.frame(time = time, value = value)

origin <- as.POSIXct("2022-01-01 00:00:00")

df$days <- as.numeric(difftime(time, origin, unit = "day"))

res <- nls(value ~ A * sin(omega * days + phi) + C,  
           data = df, 
           start = list(A = 1, omega = 1, phi = 1, C = 1))

fit <- function(res, newdata) {
  
  x <- as.numeric(difftime(origin, newdata$time, units = "days"))
  C <- as.list(coef(res))
  C$A * sin(C$omega * x + C$phi) + C$C
}

new_df <- data.frame(time = origin + as.difftime(new_times, units = "days"))
new_df$value <- fit(res, new_df)

ggplot(df, aes(time, value)) +
  geom_point() +
  geom_line(data = new_df, colour = "gray") +
  theme_bw()

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Ah, that difftime approach makes so much sense, thank you! :) However, could you still elaborate a little bit on that "new_times" variable that you use when making the new data.frame for ggplotting? Not sure if you intended it to contain some of my own real-world data or how did you use it? Without it, the plot is not reproducible as it is now. – PauliOhukainen Feb 09 '22 at 21:04
  • @PauliOhukainen the new data is simply a sequence of date-times over which you wish to predict / plot. In the example I did it over the seven days of your data, but it would work for any vector of date-times. I'm sorry I missed out the `new_times` variable. It's just something like `new_times <- seq(0, 7, 0.1)` – Allan Cameron Feb 09 '22 at 21:12