1

I have an XTS time series object which shows a value on the first of each month (representing an aggregated sum for the whole month) during four years.

When I run the stats::acf() function on it, I get a plot with lag (x axis) units in the hundreds of thousands. How can that be if I only have 48 values in my time series? If it is a time unit, then which one, and how can I change it?

Example code:

library(dplyr)
library(lubridate)
library(xts)

set.seed(100)

test <- data.frame(y = c(rep(2012, 12), rep(2013, 12), rep(2014, 12), rep(2015, 12)),
                   m = rep(seq(1, 12, 1), 4), d = rep(1, 48), value = runif(48, 0, 100))

test <- test %>%
  mutate(date = ymd(paste(y, m, d, sep = "-"))) %>% 
  select(date, value)

test <- xts(test$value, test$date)

acf(test)

enter image description here

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
Timm S.
  • 5,135
  • 6
  • 24
  • 38
  • 1
    `acf` is giving you the acf at lags of days. There are 86,400 seconds per day and your date column is in POSIXct format, which is the number of seconds since 1-1-1970. – eipi10 Feb 29 '16 at 17:08
  • That seems plausible. Dividing the lag by number of seconds in a day leads to roughly whole numbers. See also comment to Roland's answer below. – Timm S. Mar 01 '16 at 08:18
  • On second though, that still seems strange. The divisibility by 86400 would lead to the conclusion that the lag is in days. But the data only shows the first of each month. Wouldn't lag need to show months? – Timm S. Mar 01 '16 at 09:08
  • Yes, I agree. `xts` seems to ignore the `frequency` argument. For example, I get the same `acf` graph when I do `test <- xts(test$value, test$date, frequency=30.5*86400)` (or any other value of `frequency`). Also, `frequency(test)` gives the same result, regardless of what `frequency` I use when creating the `test` xts object. – eipi10 Mar 01 '16 at 16:21
  • In addition, when I use `as.POSIXct` instead of `ymd` in the `mutate` statement, I get an `acf` graph with lags in hours, rather than days. I'm not that familiar with `xts` and I'm not sure why it behaves this way or how to change it. – eipi10 Mar 01 '16 at 16:22

1 Answers1

3

From the source code we see that we can calculate the lags like this:

sampleT <- as.integer(nrow(test))
nser <- as.integer(ncol(test))
lag.max <- floor(10 * (log10(sampleT) - log10(nser)))
x.freq <- frequency(test)
lag <- outer(0:lag.max, 1/x.freq)
#         [,1]
# [1,]       0
# [2,]   86400
# [3,]  172800
# [4,]  259200
# [5,]  345600
# [6,]  432000
# [7,]  518400
# [8,]  604800
# [9,]  691200
#[10,]  777600
#[11,]  864000
#[12,]  950400
#[13,] 1036800
#[14,] 1123200
#[15,] 1209600
#[16,] 1296000
#[17,] 1382400

The time unit is the reciprocal of the frequency unit. To understand how that value is calculated you need to dive into the source code of frequency.zoo, which does something I find difficult to understand at a first glance.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • From your list of lag values and @eipi10 's comment above, it's now clear that the numbers refer to intervals of 86400 = 60 x 60 x 24 = number of seconds per day. I.e. I have to divide the lag value by 86400 to get to the lag in days. – Timm S. Mar 01 '16 at 08:18