0

I find out that the output of predict_in_sample() in pmdarima(python pkg) is different with output of fitted() in forecast(R pkg) when difference order isn't zero.

I try to compare them in three difference order:

par(mfrow=c(3,1))

# output of predict_in_sample() for each difference order
pred_val_py_diff_ord = rbind(c(-0.007880175267916512, 0.49169989404457004, 0.28841210348688023, 0.5696976455105438, 0.32260810264909945, 0.5077959474596667, 0.47801250879611523, 0.4985613475136129, 0.07088064549811457, 0.1008287770281715, 0.3425456601541411, 0.6408112270989074, 0.15507516379071695, 0.21370781747047746, 0.21008435474885973, 0.3850643917676375, 0.4155697579047887, 0.18344750047620495, 0.27881111807639414, 0.4178551640573445, 0.23041397621421822),
                             c(0.006965460018175224, 0.7598183999091976, -0.18935901503724428, 0.9169934990855064, 0.4433484668316683, 0.35237566312822355, -0.34776516890780296, 0.21444433609188196, 0.5553043926598136, 0.9436076058879592, 0.0395336961477053, 0.31321992958241585, 0.21116818372891794, 0.6250668776266268, 0.4624525102200333, 0.15295908421139226, 0.2733396734717508, 0.2968495632295873, 0.23032370622077558, 0.02165174802208386, 0.48012563330174163),
                             c(0.005566657258041537, 1.0102938244669462, -0.44428684727769174, 2.6529682660405225, 0.3185732278045508, -0.22612855833146328, -1.1471031442185335, 0.6795229000616105, 1.595506218619741, 1.2694691997522631, -1.2341607448281202, 0.040570110630779865, 0.8778166427568617, 0.8866121795729558, 0.30225375086513107, -0.44107762925047367, 0.29536136817357606, 0.6619548819980223, -0.010701331460473806, -0.4242463222526414, 1.153699395736004))

ori_time_series <- c(0.49958017, 0.15162735, 0.86757565, 0.3093554, 0.20545085, -0.48288408, 0.6880291,
                     0.8461229, 0.8320223, -0.7372907, 0.6048833, 0.40874475, 0.57708055, 0.27590698,
                     -0.21213382, 0.4236031, 0.3324298, -0.076647766, -0.20372462, 0.93162024, 0.5740154)

for (diff_ord in seq(1:3)) {
  library(forecast)
  fit.model <- Arima(ori_time_series, order = c(2, diff_ord, 1))
  
  fitted_val_r_forecast_arima <- fitted(fit.model)
  fitted_val_py <- pred_val_py_diff_ord[diff_ord,]
  plot.ts(ori_time_series, xaxp = c(0, 21, 21), ylim = c(-1.5,1.5))
  lines(fitted_val_r_forecast_arima, col='green', lty=3)
  lines(fitted_val_py, col='red', lty=4)
  mtext(paste("Compare fitted vals of R and pred_in_sample vals of python for diff order:", diff_ord))
}

output image: enter image description here

  • I observe that the first d(difference order) term has huge difference, then they getting close after d(difference order) term.
  • And I think the output of forecast(R package) is correct.
    • Based on first d term of original time-series(black line) and output of fitted() (green line)

My question is:

  • How should I adjust the parameters of predict_in_sample() in pmdarima to get the same output with fitted() in forecast?
    • I have tried the the parameter:start, but it only shorten the length of output of predict_in_sample().

PS. following codes are example of how I generate output of predict_in_sample():

import numpy as np
from pmdarima.arima import ARIMA, auto_arima

for diff_ord in range(1,4):
    model = ARIMA(order=(2,diff_ord,1), out_of_sample_size=0, mle_regression=True, suppress_warnings=True)

    ori_time_series = np.array([0.49958017, 0.15162735, 0.86757565, 0.3093554, 0.20545085, -0.48288408, 0.6880291,
                                0.8461229, 0.8320223, -0.7372907, 0.6048833, 0.40874475, 0.57708055, 0.27590698,
                                -0.21213382, 0.4236031, 0.3324298, -0.076647766, -0.20372462, 0.93162024, 0.5740154])

    model = model.fit(ori_time_series)
    pred_in_sample = np.array(model.predict_in_sample())
    print(f"pred_in_sample: {list(pred_in_sample)}")
theabc50111
  • 421
  • 7
  • 16

0 Answers0