0

In tsibbledata package, key of vic_elec data seem like row key.

library(tsibble)
library(tsibbledata)
library(lubridate)
data('vic_elec')
str(vic_elec)
str(dt)
tsibble [52,608 x 5] (S3: tbl_ts/tbl_df/tbl/data.frame)
 $ Time       : POSIXct[1:52608], format: "2012-01-01 00:00:00" "2012-01-01 00:30:00" "2012-01-01 01:00:00" "2012-01-01 01:30:00" ...
 $ Demand     : num [1:52608] 4383 4263 4049 3878 4036 ...
 $ Temperature: num [1:52608] 21.4 21.1 20.7 20.6 20.4 ...
 $ Date       : Date[1:52608], format: "2012-01-01" "2012-01-01" "2012-01-01" "2012-01-01" ...
 $ Holiday    : logi [1:52608] TRUE TRUE TRUE TRUE TRUE TRUE ...
 - attr(*, "key")= tibble [1 x 1] (S3: tbl_df/tbl/data.frame)
  ..$ .rows: list<int> [1:1] 
  .. ..$ : int [1:52608] 1 2 3 4 5 6 7 8 9 10 ...
  .. ..@ ptype: int(0) 
 - attr(*, "index")= chr "Time"
  ..- attr(*, "ordered")= logi TRUE
 - attr(*, "index2")= chr "Time"
 - attr(*, "interval")= interval [1:1] 30m
  ..@ .regular: logi TRUE

But when transform my data can't use row.name for key. When i cannot find a value that can use key, how can apply row name key like vic_elec data.

#data example    
ex <- data.frame(date_time = c("2020-01-01","2020-01-01","2020-01-02","2020-01-02","2020-01-03","2020-01-03","2020-01-04","2020-01-04"),
                     temperature = c(12,14,15,18,16,11,17,17),
                     humidity = c(78,82,76,72,71,75,74,71))

ex$date_time<- as.Date(ex$date_time)
ex
date_time  temperature  humidity
2020-01-01          12        78
2020-01-01          14        82
2020-01-02          15        76
2020-01-02          18        72
2020-01-03          16        71
2020-01-03          11        75
2020-01-04          17        74
2020-01-04          17        71
> ex %>%as_tsibble(index = date_time)
Error: A valid tsibble must have distinct rows identified by key and index.
i Please use `duplicates()` to check the duplicated rows.
Run `rlang::last_error()` to see where the error occurred.
> ex %>%as_tsibble(key = row.names(ex), index = date_time)
Error: Can't subset columns that don't exist.
x Columns `1`, `2`, `3`, `4`, `5`, etc. don't exist.
Sang won kim
  • 524
  • 5
  • 21
  • 1
    Why do you have 2 measurements of temperature and humidity for each day? Is one measurement taken at midday and the other at midnight? This is why you are unable to create the tsibble as it requires one row per time point in each series. – Mitchell O'Hara-Wild Jun 01 '21 at 00:57
  • @MitchellO'Hara-Wild Actually, real data is ymd_h data, not ymd. It is just example. But that was a helpful comment(I also checked the comments on the answer.). – Sang won kim Jun 01 '21 at 04:27

1 Answers1

1

You can create a new column with row_number() and use it as a key.

library(dplyr)
library(tsibble)

data <- ex %>%
  mutate(key  = row_number()) %>%
  as_tsibble(index = date_time, key = key)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • While setting the key to be the row number allows the tsibble to be created, the resulting time series have little value. This is because it will produce a dataset of time series which each only have one time point. At that point, it is better to keep the data in a tibble format. – Mitchell O'Hara-Wild Jun 01 '21 at 03:06
  • But if i want to apply time model, like stl ,it can't use data.frmae or tibble, right? – Sang won kim Jun 01 '21 at 04:37
  • Yes, to apply a time series model you'll need a time series. The solution above produces a time series tsibble which splits the data into many time series of length 1. You can't use an STL model with a single time point. Do you want to model the data in the order specified (`temperature = c(12,14,15,18,16,11,17,17)`)? If so you'll need to change the time index so each observation has its own time point. – Mitchell O'Hara-Wild Jun 01 '21 at 14:26