I am experimenting with functions from the imputeTS
package. This package provides several functions to impute missing values in univariate time series data. I tested them and they all great, except the na_kalman
function. This function changes the original numeric vector. Below is an example.
# Load packages
library(imputeTS)
# Set seeds
set.seed(123)
# Generate 10 random number
dat <- rnorm(10)
# Replace the first 10 numbers to be NA
dat[1:5] <- NA
# Check the numbers in dat
dat
[1] NA NA NA NA NA 1.7150650 0.4609162 -1.2650612 -0.6868529
[10] -0.4456620
As you can see, I created a vector with 10 numbers while the first 5 are NA
.
# Apply the na_kalman function
dat2 <- na_kalman(dat)
# Check the numbers in dat2
dat2
[1] 1.7150650 1.7150650 1.7150650 1.7150650 1.7150650 1.7150650 0.4609162 -1.2650612 -0.6868529
[10] -0.4456620
# Check the numbers in dat again
dat
[1] 1.7150650 1.7150650 1.7150650 1.7150650 1.7150650 1.7150650 0.4609162 -1.2650612 -0.6868529
[10] -0.4456620
It seems like the dat2
shows the na_kalman
function successfully imputed the NA
. However, the original vector, dat
, was also changed. This is a behavior I want to avoid. I would like to know if there is a way to ask na_kalman
not to change the original vector.
Note
When I changed the vector length to a large number, such as
rnorm(1000)
, I notice that all the missing values indat
will be changed to the first non-missing values in the original data. It seems likedat
is not simply a copy ofdat2
after thena_kalman
function.I also tested other functions from the
imputeTS
package, such asna_interpolation
,na_locf
,na_mean
. They don't have this behavior.dat
remains to be the same vector after running those function.