-1

I have the following dataset test_1

Date    Frequency
0   2020-01-20  10
1   2020-01-21  2
2   2020-01-22  1
3   2020-01-23  10
4   2020-01-24  6
... ... ...
74  2020-04-04  7
75  2020-04-05  9
76  2020-04-06  8
77  2020-04-07  6
78  2020-04-08  1

where Frequency is a calculated column with the frequency of users by date.

I would like to predict the future trends and to do it I am considering an ARIMA model. I have used this code

# fit model
model = ARIMA(test_1, order=(5,1,0))
model_fit = model.fit(disp=0)
print(model_fit.summary())
# plot residual errors
residuals = DataFrame(model_fit.resid)
residuals.plot()
pyplot.show()
residuals.plot(kind='kde')
pyplot.show()
print(residuals.describe())

but I have got this error: ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).

due to model = ARIMA(test_1, order=(5,1,0)).

Do you know what it means and how I could fix it?

1 Answers1

1

This error states that ARIMA expects an array-like object, but you've passed a DataFrame instead.

This can be solved by passing the test_1["Frequency"] instead of just test_1. Also, I will fix some of the other things that I encountered in your code:

import pandas as pd
from statsmodels.tsa.arima_model import ARIMA
import matplotlib.pyplot as pyplot

# fit model
model = ARIMA(test_1["Frequency"], order=(5,1,0)) #<--- change this
model_fit = model.fit(disp=0)
print(model_fit.summary())
# plot residual errors
residuals = pd.DataFrame(model_fit.resid)
residuals.plot(kind='kde')
print(residuals.describe())
pyplot.show()
Anwarvic
  • 12,156
  • 4
  • 49
  • 69
  • Thank you so much Anwarvic. I have got this other error: `----> 6 residuals = DataFrame(model_fit.resid) 7 residuals.plot() 8 pyplot.show() NameError: name 'DataFrame' is not defined` . Do you think is anything related to the previous error? –  Jun 20 '20 at 16:16