0

Here is my toy time series data:

library(tidyverse); library(tsibble); library(feasts)

df <- tibble::tribble(
         ~date,     ~A,     ~B,     ~C,
   "1/31/2010",     NA,  0.017,     NA,
   "2/28/2010",     NA,  0.027,     NA,
   "3/31/2010",     NA,  0.003,  0.003,
   "4/30/2010", -0.022,  0.018,  0.018,
   "5/31/2010", -0.036,   0.02,   0.02,
   "6/30/2010", -0.046,  0.023,  0.023,
   "7/31/2010",     NA,  0.027,  0.027,
   "8/31/2010", -0.022,  0.008,  0.008,
   "9/30/2010",  0.059, -0.003, -0.003,
  "10/31/2010",  0.024,  0.058,  0.058,
  "11/30/2010",     NA,  0.023,     NA,
  "12/31/2010",     NA,  0.014,     NA
  )
    

I want to calculate autocorrelation (acf) of multiple time series. Ignoring the imputation part, I need to:

  1. Remove the variables with inbetween NAs (not those at the start and end of the time series) like NA on 7/31/2010 for A. So in this case, remove variable A.
  2. Calculate the auto correlations potentially using ACF function from feasts package on B and C.

I started here and got stuck:

df %>%
      mutate(date = mdy(date)) %>% 
      pivot_longer(cols = -date) %>% 
      as_tsibble(key = name, index = date) %>% 
      ACF() 

The expected output would have autocorrelations of every possible series by lag. Like B will have 10-11 values for 10 lags and same for series B

Geet
  • 2,515
  • 2
  • 19
  • 42
  • *"Remove the variables with inbetween NAs "* Can you explain what you mean by that? What variable? Do you want to omit that specific time point from `A`? Do you want to omit `A` alltogether? – Maurits Evers Jun 25 '20 at 11:53
  • Omit A alltogether. I am finding it difficult to detect variables with inbetween NAs. – Geet Jun 25 '20 at 11:53
  • Is it possible to have a variable with NA only at the start and NOT at the end? If so, do you remove them? – Sotos Jun 25 '20 at 12:19
  • @Geet You can achieve the first part using `rle` (see my answer); not sure what you expect for part 2. What is your expected output? – Maurits Evers Jun 25 '20 at 12:50
  • 1
    @sotos Yes, it is possible to have a variable with NA only at the start and NOT at the end and I will not remove that as I would want to remove only the variables that have in between NAs. – Geet Jun 25 '20 at 13:11

1 Answers1

2

Regarding part 1

We can make use of rle. Let's define a concise custom function has_middle_NA

has_middle_NA <- function(x) {
    rl <- rle(is.na(x))$values
    any(rl[-c(1, length(rl))])
}

Then

df %>%
    group_by(date) %>%
    select_if(~ !has_middle_NA(.x)) %>%
    ungroup()
## A tibble: 12 x 3
#   date            B      C
#   <chr>       <dbl>  <dbl>
# 1 1/31/2010   0.017 NA
# 2 2/28/2010   0.027 NA
# 3 3/31/2010   0.003  0.003
# 4 4/30/2010   0.018  0.018
# 5 5/31/2010   0.02   0.02
# 6 6/30/2010   0.023  0.023
# 7 7/31/2010   0.027  0.027
# 8 8/31/2010   0.008  0.008
# 9 9/30/2010  -0.003 -0.003
#10 10/31/2010  0.058  0.058
#11 11/30/2010  0.023 NA
#12 12/31/2010  0.014 NA

This removes all columns with NAs that are not leading or trailing.

Regarding part 2

It's still not really clear to me what you're trying to do with ACF based on the data you give; but perhaps this helps.

The key is to treat your data as monthly data, ignoring the day. We can then:

  • Convert your data into monthly data using zoo::yearmon,
  • Select those columns that have no NAs "in the middle",
  • Convert from wide-to-long and create a tsibble from every column,
  • Use feasts::ACF to calculate the ACF for every column and store the result in a list column of tsibbles
library(tsibble)
library(tidyverse)
library(feasts)
library(zoo)
df <- df %>%
    mutate(date = as.yearmon(date, format = "%m/%d/%Y")) %>%
    group_by(date) %>%
    select_if(~ !has_middle_NA(.x)) %>%
    ungroup() %>%
    pivot_longer(-date) %>%
    group_by(name) %>%
    nest() %>%
    mutate(
        data = map(data, as_tsibble),
        ACF = map(data, ACF))
## A tibble: 2 x 3
## Groups:   name [2]
#  name  data               ACF
#  <chr> <list>             <list>
#1 B     <tsibble [12 × 2]> <tsibble [10 × 2]>
#2 C     <tsibble [12 × 2]> <tsibble [7 × 2]>

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Thanks for the 1st part. Regarding 2, I want to calculate the auto correlations on B and C potentially using ACF function from feasts package. – Geet Jun 25 '20 at 13:17
  • @Geet Yes I'm familiar with what ACF is. The question is: What do you expect as your expected output? Want to you want to return? – Maurits Evers Jun 25 '20 at 13:17
  • @Geet Additionally, `feasts::ACF` will complain about gaps in your data. How do you want to deal with that? – Maurits Evers Jun 25 '20 at 13:23
  • Yeah, I am actually not sure, what to do about that as well :(. May be the other acf function from stats will help, right? – Geet Jun 25 '20 at 13:24
  • And, I guess, if it that's possible, then the expected output would be Autocorrelations of every possible series by lag. Like B will have 10-11 values for 10 lags and same for series B, and so on. – Geet Jun 25 '20 at 13:26
  • @Geet Base R's `acf` will also complain about gaps (or rather: missing values). So thats not going to help. I'm not sure whether what you're asking makes sense. – Maurits Evers Jun 25 '20 at 13:41
  • 1
    @Geet Please take a look at my edit. You can calculate an ACF if you treat your data as monthly data (ignoring the day component). – Maurits Evers Jun 25 '20 at 13:50