1

Problem description
I work with trice monthly data a lot. Trice monthly (or roughly every 10 days, also referred to as a dekad) it is the typical reporting interval for water related data in the former Soviet Union and for many more climate/water related data sets around the world. Below is an examplary data set with 2 variables:

> date = unique(floor_date(seq.Date(as.Date("2019-01-01"), as.Date("2019-12-31"), 
                                    by="day"), "10days"))
> example_data <- tibble(
    date = date[day(date)!=31],  
    value = seq(1,36,1),  
    var = "A") %>%
    add_row(tibble(
      date = date[day(date)!=31],  
      value = seq(10,360,10),  
      var = "B")) 
> example_data
# A tibble: 72 x 3
# Groups:   var [2]
   date       value var  
   <ord>      <dbl> <chr>
 1 2019-01-01     1 A    
 2 2019-01-01    10 B    
 3 2019-01-11     2 A    
 4 2019-01-11    20 B    
 5 2019-01-21     3 A    
 6 2019-01-21    30 B    
 7 2019-02-01     4 A    
 8 2019-02-01    40 B    
 9 2019-02-11     5 A    
10 2019-02-11    50 B    
# … with 62 more rows

In the example I chose the 1., 11., and 21. to date the decades but it would actually be more appropriate to index them in dekad 1 to 3 per month (analogue to months 1 to 12 per year) or in dekad 1 to 36 per year (analogue to day of the year). The most elegant solution would be to have a proper date format for dekadal data like yearmonth in lubridate. However, lubridate may not plan to do support dekadal data in the near future (github conversation).

I have workflows using tsibble and timetk which work well with monthly data but it would really be more appropriate to work with the original dekadal time steps and I'm looking for a way to be able to use the tidyverse functions with dekadal data with as few cumbersome workarounds as possible.
The problem with using daily dates for dekadal data in tsibble is that is identifies the time interval as daily and you get a lot of data gaps between your 3 values per month:

> example_data_tsbl <- as_tsibble(example_data, index = date, key = var)
> count_gaps(example_data_tsbl, .full = FALSE)
# A tibble: 70 x 4
   var   .from      .to           .n
   <chr> <date>     <date>     <int>
 1 A     2019-01-02 2019-01-10     9
 2 A     2019-01-12 2019-01-20     9
 3 A     2019-01-22 2019-01-31    10
# … 

Here's what I did so far:

  1. I saw here the possibility to define ordered factors as indices in tsibble but timetk does not recognise factors as indices. timetk suggests to define custom indices (see 2.).
  2. There is the possibility to add custom indices to tsibble but I haven't found examples on this and I don't understand how I have to use these functions (a vignette is still planned). I have started reading the code to try to understand how to use the functions to get support for dekadal data but I'm a bit overwhelmed.

Questions

  • Will dekadal custom indices in tsibble behave similarly as the yearmonth or weekyear?

  • Would anyone here have an example to share on how to add custom indices to tsibble?

  • Or does anyone know of another way to elegantly handle dekadal data in the tidyverse?

mabe
  • 125
  • 1
  • 10

2 Answers2

0

This doesn't discuss tsibbles but it was too long for a comment and does provide an alternative.

zoo can do this either by (1) the code below which does not require the creation of a new class or (2) by creating a new class and methods. For that alternative following the methods that the yearmon class has would be sufficient. See here. zoo itself does not have to be modified.

As we see below, for the first approach dates will be shown as year(cycle) where cycle is 1, 2, ..., 36. Internally the dates are stored as year + (cycle-1)/36 .

It would also be possible to use ts class if the dates were consecutive month thirds (or if not if you don't mind having NAs inserted to make them so). For that use as.ts(z).

Start a fresh session with no packages loaded and then copy and paste the input DF shown in the Note at the end and then this code. Date2dek will convert a Date vector or a character vector representing dates in standard yyyy-mm-dd format to a dek format which is described above. dek2Date performs the inverse transformation. It is not actually used below but might be useful.

library(zoo)

# convert Date or yyyy-mm-dd char vector
Date2dek <- function(x, ...) with(as.POSIXlt(x, tz="GMT"), 
  1900 + year + (mon + ((mday >= 11) + (mday >= 21)) / 3) / 12)

dek2Date <- function(x, ...) {     # not used below but shows inverse
  cyc <- round(36 * (as.numeric(x) %% 1)) + 1
  if(all(is.na(x))) return(as.Date(x))
  month <- (cyc - 1) %/% 3 + 1
  day <- 10 * ((cyc - 1) %% 3) + 1
  year <- floor(x + .001)
  ix <- !is.na(year)
  as.Date(paste(year[ix], month[ix], day[ix], sep = "-")) 
}

# DF given in Note below
z <- read.zoo(DF, split = "var", FUN = Date2dek, regular = TRUE, freq = 36)
z

The result is the following zooreg object:

        A  B
2019(1) 1 10
2019(2) 2 20
2019(3) 3 30
2019(4) 4 40
2019(5) 5 50

Note

DF <- data.frame(
  date = as.Date(ISOdate(2019, rep(1:2, 3:2), c(1, 11, 21))), 
  value = c(1:5, 10*(1:5)), 
  var = rep(c("A", "B"), each = 5))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thank you so much for your answer! I actually have been using ts and zoo before with my decadal data and it has been working well (should have mentioned it in the question). – mabe Jan 22 '21 at 21:03
0

Extending tsibble to support a new index requires defining methods for these generics:

  • index_valid() - This method should return TRUE if the class is acceptable as an index
  • interval_pull() - This method accepts your index values and computes the interval of the data. The interval can be created using tsibble:::new_interval(). You may find tsibble::gcd_interval() useful for computing the smallest interval.
  • seq() and + - These methods are used to produce future time values using the new_data() function.

A minimal example of a new tsibble index class for 'year' is as follows:

library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union
library(vctrs)

# Object creation function
my_year <- function(x = integer()) {
  x <- vec_cast(x, integer())
  vctrs::new_vctr(x, class = "year")
}

# Declare this class as a valid index
index_valid.year <- function(x) TRUE

# Compute the interval of a year input
interval_pull.year <- function(x) {
  tsibble::new_interval(
    year = tsibble::gcd_interval(vec_data(x))
  )
}

# Specify how sequences are generated from years
seq.year <- function(from, to, by, length.out = NULL, along.with = NULL, ...) {
  from <- vec_data(from)
  if (!rlang::is_missing(to)) {
    vec_assert(to, my_year())
    to <- vec_data(to)
  }
  my_year(NextMethod())
}

# Define `+` operation as needed for `new_data()`
vec_arith.year <- function(op, x, y, ...) {
  my_year(vec_arith(op, vec_data(x), vec_data(y), ...))
}

# Use the new index class
x <- tsibble::tsibble(
  year = my_year(c(2018, 2020, 2024)),
  y = rnorm(3), 
  index = "year"
)
x
#> # A tsibble: 3 x 2 [2Y]
#>     year      y
#>   <year>  <dbl>
#> 1   2018  0.211
#> 2   2020 -0.410
#> 3   2024  0.333
interval(x)
#> <interval[1]>
#> [1] 2Y
new_data(x, 3)
#> # A tsibble: 3 x 1 [2Y]
#>     year
#>   <year>
#> 1   2026
#> 2   2028
#> 3   2030

Created on 2021-02-08 by the reprex package (v0.3.0)

  • Thank you very much for taking the time to add your answer. I will try it out and post an update in time. – mabe Mar 26 '21 at 09:34