Is there a way to simulate time series data with a specific rolling mean and autocorrelation in R?

Question

I have an existing time series (1000 samples) and calculated the rolling mean using the filter() function in R, averaging across 30 samples each. The goal of this was to create a "smoothed" version of the time series. Now I would like to create artificial data that "look like" the original time series, i.e., are somewhat noisy, that would result in the same rolling mean if I would apply the same filter() function to the artificial data. In short, I would like to simulate a time series with the same overall course but not the exact same values as those of an existing time series. The overall goal is to investigate whether certain methods can detect similarity of trends between time series, even when the fluctuations around the trend are not the same.

To provide some data, my time series looks somewhat like this:

set.seed(576)
ts <- arima.sim(model = list(order = c(1,0,0), ar = .9), n = 1000) + 900

# save in dataframe
df <- data.frame("ts" = ts)

# plot the data
plot(ts, type = "l")

The filter function produces the rolling mean:

my_filter <- function(x, n = 30){filter(x, rep(1 / n, n), sides = 2, circular = T)}
df$rolling_mean <- my_filter(df$ts)
lines(df$rolling_mean, col = "red")

To simulate data, I have tried the following:

Adding random noise to the rolling mean.

df$sim1 <- df$rolling_mean + rnorm(1000, sd = sd(df$ts))

lines(df$sim1, col = "blue")

df$sim1_rm <- my_filter(df$sim1)
lines(df$sim1_rm, col = "green")

The problem is that a) the variance of the simulated values is higher than the variance of the original values, b) that the rolling average, although quite similar to the original, sometimes deviates quite a bit from the original, and c) that there is no autocorrelation. To have an autocorrelational structure in the data would be good since it is supposed to resemble the original data.

Edit: Problem a) can be solved by using sd = sqrt(var(df$ts)-var(df$rolling_mean)) instead of sd = sd(df$ts).

I tried arima.sim(), which seems like an obvious choice to specify the autocorrelation that should be present in the data. I modeled the original data using arima(), using the model parameters as input for arima.sim().

ts_arima <- arima(ts, order = c(1,0,1))

my_ar <- ts_arima$coef["ar1"]
my_ma <- ts_arima$coef["ma1"]
my_intercept <- ts_arima$coef["intercept"]

df$sim2 <- arima.sim(model = list(order = c(1,0,1), ar = my_ar, ma = my_ma), n = 1000) + my_intercept

plot(df$ts)
lines(df$sim2, col = "blue")

The resulting time series is very different from the original. Maybe a higher order for ar and ma in arima.sim() would solve this, but I think a whole different method might be more appropriate.

Regarding the problem 1a) (variance higher in simulation than in original). I guess this is because you also have variance in the rolling mean. You can think of the total variance as the sum of the "smoothed" data's variance and noise around it. So when you use the SD from the total data to make noise around the rolling mean you end up with a variance that includes the variance of the smoothed data twice. Maybe this would help: `df$sim1 <- df$rolling_mean + rnorm(1000, sd = sqrt(var(df$ts)-var(df$rolling_mean)))` — benimwolfspelz, Oct 27 '20 at 18:24
@benimwolfspelz Thank you, that is a really good point. I will edit the question accordingly. — Bernadette Denk, Oct 27 '20 at 19:12

Is there a way to simulate time series data with a specific rolling mean and autocorrelation in R?

0 Answers0