Problems with modeling AR(1) with three different techniques in R: 1. the formula 2.for-loop 3.arima.sim

Question

(reproducible example added)
I tried to model Y_t=0.6Y_t-1+V_t (V_t ~ N(0,1)) AR(1) with three different techniques.

1. The formula Y_t =ρ^t[y₁/ρ + ∑_{j=2 to t} (α+V_j)/ρ^j] of Y_t=α+ρY_t-1+V_t obtained via back substitution and using cumulative sum.

2. The classical for loop

3. arima.sim simulation.

While I expect to get more or less the same things, interesting and unexpected things happened:

N <- 1388; ro <- 0.6; a <- 0
set.seed(1)
v <- ts(rnorm(N,0,1))

# Formula technique
y1 <- ts(rep(0,N))  # y[1]:=0 defined
y1[-1] <- ro^(2:N) * (y1[1]/ro + cumsum((a+v[-1]) / ro^(2:N)))

# The classical "for" loop
y2 <- ts(rep(0,N))  # y2[1]:=0 defined    
for (t in 2:N){  y2[t] <- a + ro*y2[t-1]+v[t] }

# arima.sim simulation 
set.seed(1) 
y3 <- arima.sim(model=list(ar=0.6), n=1388, mean=0, sd=sqrt(1)) 
# change n in arima.sim accordingly such that n=N simultaneously with the N definition above


c(mean(y1),sd(y1))
c(mean(y2),sd(y2))
c(mean(y3),sd(y3))

N=1388 (and n=1388) gives:

[1] -0.03713488  1.26102077
[1] -0.03713488  1.26102077
[1] -0.01048798  1.28445899

N=1389 (and n=1389) gives:

[1] Inf NaN
[1] -0.03661779  1.26071373
[1] -0.01048798  1.28445899

The standard deviation values are just as expected:
StdDev(Y_t)=sqrt[sd(v)²/(1-ρ²)]

sqrt(sd(v)^2/(1-(0.6)^2)) # 1.284229

Strangenesses appear for the formula technique:

c(mean(y1),sd(y1)) is defined for N<=1388
c(mean(y1),sd(y1)) is (Inf, NaN) for N=1389 and N=1390
c(mean(y1),sd(y1)) is (NaN, NA) for N>=1391.

Question:
1. Why do the formula technique fail for N>=1989?
(though we have a stationary series with ρ=0.6)

2. Why do we get c(mean(y1),sd(y1))#(Inf, NaN) for N=1389 and N=1390 and c(mean(y1),sd(y1))#(NaN, NA) for N>=1391?

Perfect question for stats.stackexchange.com. – FatihAkici Aug 03 '19 at 16:56 — FatihAkici, Aug 03 '19 at 16:56
Welcome to floating point numbers – Grada Gukovic Aug 03 '19 at 17:40 — Grada Gukovic, Aug 03 '19 at 17:40

Grada Gukovic · Accepted Answer · 2019-08-07T10:03:40.620

With N <- 1389; ro <- 0.6; a <- 0 the last element of cumsum((a+v[-1]) / ro^(2:N)) is larger than the maximally representable number on your PC i.e. your .Machine$double.xmax. (I am assuming here it is maximally 1.797693e+308. Thus it is represented as Inf. This causes the last element of y1 to become an Inf as well as any legal arithmetic operation on Inf and a numeric yields Inf. sd of a vector conatining an Inf is a NaN:

N <- 1389; ro <- 0.6; a <- 0
set.seed(1)
v <- ts(rnorm(N,0,1))

# Formula technique
y1 <- ts(rep(0,N))  # y[1]:=0 defined
y1[-1] <- ro^(2:N) * (y1[1]/ro + cumsum((a+v[-1]) / ro^(2:N)))

tail((a+v[-1]) / ro^(2:N))
#[1] -8.183997e+306  1.049540e+307 -1.581977e+306 -3.189985e+307 -7.577105e+307
#[6]            Inf
tail(cumsum((a+v[-1]) / ro^(2:N)))
#[1] -1.814422e+307 -7.648815e+306 -9.230793e+306 -4.113064e+307 -1.169017e+308
#[6]            Inf
tail(y1)
#[1] -1.6598499 -0.4198324 -0.3039989 -0.8127365 -1.3859780        Inf

N <- 1390; ro <- 0.6; a <- 0. The last value of (a+v[-1]) / ro^(2:N) is representable and thus nothing changes.

N <- 1390; ro <- 0.6; a <- 0
set.seed(1)
v <- ts(rnorm(N,0,1))

# Formula technique
y1 <- ts(rep(0,N))  # y[1]:=0 defined
y1[-1] <- ro^(2:N) * (y1[1]/ro + cumsum((a+v[-1]) / ro^(2:N)))

tail((a+v[-1]) / ro^(2:N))
#[1]  1.049540e+307 -1.581977e+306 -3.189985e+307 -7.577105e+307            Inf
#[6] -5.546904e+307
tail(cumsum((a+v[-1]) / ro^(2:N)))
#[1] -7.648815e+306 -9.230793e+306 -4.113064e+307 -1.169017e+308            Inf
#[6]            Inf
tail(y1)
#[1] -0.4198324 -0.3039989 -0.8127365 -1.3859780        Inf        Inf

N <- 1391; ro <- 0.6; a <- 0 the last value of (a+v[-1]) / ro^(2:N) becomes -Inf (smaller than .Machine$double.xmin) and the cumsum has to add Inf and -Inf, which gives a NaN, which means not a number. The mean of NaN s is not a number as well and the sd(NaN) is NA.

N <- 1391; ro <- 0.6; a <- 0
set.seed(1)
v <- ts(rnorm(N,0,1))

# Formula technique
y1 <- ts(rep(0,N))  # y[1]:=0 defined
y1[-1] <- ro^(2:N) * (y1[1]/ro + cumsum((a+v[-1]) / ro^(2:N)))


tail((a+v[-1]) / ro^(2:N))
#[1] -1.581977e+306 -3.189985e+307 -7.577105e+307            Inf -5.546904e+307
#[6]           -Inf
tail(cumsum((a+v[-1]) / ro^(2:N)))
#[1] -9.230793e+306 -4.113064e+307 -1.169017e+308            Inf            Inf
#[6]            NaN
tail(y1)
#[1] -0.3039989 -0.8127365 -1.3859780        Inf        Inf        NaN

Problems with modeling AR(1) with three different techniques in R: 1. the formula 2.for-loop 3.arima.sim

1 Answers1