I am working in monitoring team, we do monitor of our client load on our tools. We recorded latency with respective to timeseries.
Initially, I kept a static threshold to raise the anomaly detection. However, it doesn't work if seasonality occurs. Now, I am planning to apply ML on my data.
My data looks like:
volume_nfs_ops timestamp mount_point
---------------------------------------------------------
2103 6/28/2018 3:16 /slowfs/us01dwt2p311
12440 6/28/2018 6:03 /slowfs/us01dwt2p311
14501 6/28/2018 14:20 /slowfs/us01dwt2p311
12482 6/28/2018 14:45 /slowfs/us01dwt2p311
10420 6/28/2018 18:09 /slowfs/us01dwt2p311
7203 6/28/2018 18:34 /slowfs/us01dwt2p311
14104 6/28/2018 21:58 /slowfs/us01dwt2p311
6996 6/29/2018 7:35 /slowfs/us01dwt2p311
11282 6/29/2018 8:39 /slowfs/us01dwt2p311
When I do google, I came up ARIMA is the best model for time series. I am towards mathematics and could figure whether respective ARIMA is good for my data set.
My question is which algorithm is best to implement in Python? Which factors should I consider to find an anomaly?