-3

I am working in monitoring team, we do monitor of our client load on our tools. We recorded latency with respective to timeseries.

Initially, I kept a static threshold to raise the anomaly detection. However, it doesn't work if seasonality occurs. Now, I am planning to apply ML on my data.

My data looks like:

enter image description here

volume_nfs_ops   timestamp           mount_point
---------------------------------------------------------
2103             6/28/2018 3:16      /slowfs/us01dwt2p311
12440            6/28/2018 6:03      /slowfs/us01dwt2p311
14501            6/28/2018 14:20     /slowfs/us01dwt2p311
12482            6/28/2018 14:45     /slowfs/us01dwt2p311
10420            6/28/2018 18:09     /slowfs/us01dwt2p311
7203             6/28/2018 18:34     /slowfs/us01dwt2p311
14104            6/28/2018 21:58     /slowfs/us01dwt2p311
6996             6/29/2018 7:35      /slowfs/us01dwt2p311
11282            6/29/2018 8:39      /slowfs/us01dwt2p311

When I do google, I came up ARIMA is the best model for time series. I am towards mathematics and could figure whether respective ARIMA is good for my data set.

My question is which algorithm is best to implement in Python? Which factors should I consider to find an anomaly?

paulina_glab
  • 2,467
  • 2
  • 16
  • 25
  • 1
    There is no "best" model for time series (or for anything else, for that matter) - everything depends on the specific problem. Please do take some time to read [What topics can I ask about here?](https://stackoverflow.com/help/on-topic), and notice that questions asking us to *recommend or find a book, tool, software library, tutorial or other off-site resource* are off-topic for Stack Overflow – desertnaut Jul 16 '18 at 22:10
  • Thanks for that. I have tried threshold limit, it didnt work well. now I have to implement to ML side. – sai sasank Jul 16 '18 at 22:33

1 Answers1

0

There are plenty of anomaly detection technique. Even to detect anomaly in time series data, One need not to go into time series forecasting algorithms.few approaches-

A) If you have known abnormality use classification algo. In your case may be a threshold value for abnormality ? -

https://machinelearningstories.blogspot.com/2018/07/anomaly-detection-anomaly-detection-by.html

B) If there are not known abnormalities in the data.Then you need to go for unsupervised abnormality detection algos. K-Means, LOF, CBF, PCA, Angular etc.

C) unsupervised algos ( outliers of clustering) never give abnormality instead these are outliers, So if you feel your outliers are represents abnormality then use these clustering (B) based abnormality detection algo.

4) Anomaly detection and time series are altogether area of expertise. Don't get confused. I can share some documents if you thing you are looking for unsupervised abnormality detection algos.

Arpit Sisodia
  • 570
  • 5
  • 18