0

I am trying to decide between using lm and dylm for some regressions and have done some tests to see if I have reliable results using the lag operator. However, I found a result that seemed unusual to me.

First, I did the estimation with the regressor already with the lag (1) and then using the lag function (2) in lm.

(1)

lm_PCLR_stack <-lm (IPCA_diff1 ~ IPCA_diff1_lag1 + U3_log -1, data = final_data_clean)
summary(lm_PCLR_stack)


Call:
lm(formula = IPCA_diff1 ~ IPCA_diff1_lag1 + U3_log - 1, data = final_data_clean)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0067594 -0.0021942 -0.0001499  0.0017506  0.0123410 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
IPCA_diff1_lag1 0.5348701  0.0934555   5.723 1.53e-07 ***
U3_log          0.0030485  0.0007368   4.138 8.22e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.003406 on 85 degrees of freedom
Multiple R-squared:  0.8029,    Adjusted R-squared:  0.7983 
F-statistic: 173.1 on 2 and 85 DF,  p-value: < 2.2e-16

(2)

lmlag_PCLR_stack <-lm(IPCA_diff1 ~ lag(IPCA_diff1, 1) + U3_log -1, data = final_data_clean)
summary(lmlag_PCLR_stack)


Call:
lm(formula = IPCA_diff1 ~ lag(IPCA_diff1, 1) + U3_log - 1, data = final_data_clean)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0067577 -0.0021978 -0.0001549  0.0017687  0.0123458 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
lag(IPCA_diff1, 1) 0.5342741  0.0942942   5.666 2.00e-07 ***
U3_log             0.0030494  0.0007412   4.114 9.03e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.003426 on 84 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.8004,    Adjusted R-squared:  0.7956 
F-statistic: 168.4 on 2 and 84 DF,  p-value: < 2.2e-16

So I repeated the process using dynlm. For the regression with the regressor already with the lag (3) the result was the same, but for the regression with the lag operator (4), the result was not only different but a little strange.

(3)

dynlm_PCLR_stack <-dynlm(IPCA_diff1 ~ IPCA_diff1_lag1 + U3_log - 1, data = final_data_clean)
summary(dynlm_PCLR_stack)

Time series regression with "numeric" data:
Start = 1, End = 87

Call:
dynlm(formula = IPCA_diff1 ~ IPCA_diff1_lag1 + U3_log - 1, data = final_data_clean)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0067594 -0.0021942 -0.0001499  0.0017506  0.0123410 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
IPCA_diff1_lag1 0.5348701  0.0934555   5.723 1.53e-07 ***
U3_log          0.0030485  0.0007368   4.138 8.22e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.003406 on 85 degrees of freedom
Multiple R-squared:  0.8029,    Adjusted R-squared:  0.7983 
F-statistic: 173.1 on 2 and 85 DF,  p-value: < 2.2e-16

(4)

dynlm_lag_PCLR_stack <-dynlm(IPCA_diff1 ~ L(IPCA_diff1) + (U3_log) - 1, data = final_data_clean)
summary(dynlm_lag_PCLR_stack)

essentially perfect fit: summary may be unreliable
Time series regression with "numeric" data:
Start = 1, End = 87

Call:
dynlm(formula = IPCA_diff1 ~ L(IPCA_diff1) + (U3_log) - 1, data = final_data_clean)

Residuals:
       Min         1Q     Median         3Q        Max 
-2.302e-18 -4.373e-19 -2.505e-19 -1.223e-19  3.053e-17 

Coefficients:
               Estimate Std. Error  t value Pr(>|t|)    
L(IPCA_diff1) 1.000e+00  9.088e-17 1.10e+16   <2e-16 ***
U3_log        1.575e-19  7.112e-19 2.21e-01    0.825    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.359e-18 on 85 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:      1 
F-statistic: 2.217e+32 on 2 and 85 DF,  p-value: < 2.2e-16

Any tips on what might have happened? There are no NAs at the base. Thanks in advance!

HenriqueP.
  • 13
  • 4
  • 1
    Please provide a simple self-contained and reproducible example. Otherwise it's hard to find out what happened. It seems that the differences come from using the lag() operator on numeric data (rather than time series in ts or zoo). But I couldn't reconstruct what was going on exactly. For using lm() vs. dynlm() with "ts" data, see the simple example on slides 45-46 at https://eeecon.uibk.ac.at/~zeileis/teaching/AER/Ch-TimeSeries.pdf – Achim Zeileis Jan 27 '20 at 23:17
  • 1
    Thanks for the answer, @AchimZeileis. You're right. I tested it with a zoo and everything worked fine. Thanks for the reading material. – HenriqueP. Jan 29 '20 at 18:36

0 Answers0