I have the following dataset:
import pandas as pd
from pmdarima import auto_arima
url_train_a = 'https://raw.githubusercontent.com/oreilly-mlsec/' + \
'book-resources/master/chapter3/datasets/cpu-utilization/cpu-train-a.csv'
df_train_a = pd.read_csv(url_train_a, parse_dates=[0], infer_datetime_format=True)
And I can see that the column date time was converted to the right format:
df_train_a.head(10)
datetime cpu
0 2017-01-27 18:42:00 1.14
1 2017-01-27 18:43:00 1.10
2 2017-01-27 18:44:00 1.09
3 2017-01-27 18:45:00 1.08
4 2017-01-27 18:46:00 1.08
5 2017-01-27 18:47:00 1.08
6 2017-01-27 18:48:00 1.15
7 2017-01-27 18:49:00 1.13
8 2017-01-27 18:50:00 1.09
9 2017-01-27 18:51:00 1.06
But when trying to apply the auto_arima function, I get this error:
stepwise_model = auto_arima(df_train_a, start_p=1, start_q=1,
max_p=3, max_q=3, m=12,
start_P=0, seasonal=True,
d=1, D=1, trace=True,
error_action='ignore',
suppress_warnings=True)
TypeError: The DType <class 'numpy.dtype[datetime64]'> could not be promoted by <class 'numpy.dtype[float64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtype[datetime64]'>, <class 'numpy.dtype[float64]'>)
I tried to detect some NaNs (perhaps NaNs should be converted to NaTs) and checked this:
df_train_a[df_train_a['datetime'].isnull()]
# No NaNs detected
df_train_a.describe(datetime_is_numeric=True)
datetime cpu
count 420 420.000000
mean 2017-01-27 22:11:29.999999744 1.233262
min 2017-01-27 18:42:00 0.570000
25% 2017-01-27 20:26:45 0.787500
50% 2017-01-27 22:11:30 1.110000
75% 2017-01-27 23:56:15 1.582500
max 2017-01-28 01:41:00 2.550000
std NaN 0.505668
What I am doing wrong? Is it something with the library?