I am trying to gap-fill weather data, my data is half-hourly, but here I prepared a reproducible code for hourly data. Because the weather data is seasonal, first I create a time series using stat::ts() and then I feed that to Kalman filter (imputeTS::na_seadec) or forecast::na.interp, however, the code is very slow while if I feed the raw data to kalam filter without created ts it is pretty fast but it loses the seasonality information. Also, I tried find_frequency = TRUE in imputeTS::na_seadec(), again it makes the code too slow (it takes hours and hours for a single time series). I wonder if there is a way to use Kalman filter but preserve the seasonality information.
library(riem)
library(dplyr)
library(imputeTS)
library(forecast)
library(stats)
library(plotly)
Raw_data =riem_measures("SFO", date_start = "2010-01-01")
Gapfilled <- Raw_data %>%
dplyr::mutate(tmpfts = ts(data = .$tmpf,
start = min(time(valid)),
frequency = 24)) %>%
dplyr::mutate(ts_interpFilled = forecast::na.interp(tmpfts) %>% as.numeric(),
na_seadecKalman = imputeTS::na_seadec(tmpfts, algorithm = "kalman"),
na_seadecma = imputeTS::na_seadec(tmpf, algorithm = "ma"),
# na_kalman = imputeTS::na_kalman(tmpfts, model = "auto.arima"),
tsclean = forecast::tsclean (tmpfts) %>% as.numeric()
)
plot_ly(Gapfilled, x = ~valid) %>%
add_trace(y = ~ tmpf, name = 'Actuals',mode = 'lines', type = 'scatter' ) %>%
add_trace(y = ~ts_interpFilled, name = 'forecast::na.interp', mode = 'lines', type = 'scatter') %>%
add_trace(y = ~na_seadecma, name = 'imputeTS::na_seadecma', mode = 'lines', type = 'scatter') %>%
add_trace(y = ~tsclean, name = 'forecast::tsclean', mode = 'lines', type = 'scatter') %>%
# add_trace(y = ~na_kalman, name = 'imputeTS::na_kalman', mode = 'lines', type = 'scatter') %>%
add_trace(y = ~na_seadecKalman, name = 'imputeTS::na_seadecKalman', mode = 'lines', type = 'scatter')