1

This is a general question on if anyone is aware of a library like sklearn which has a function to read data and report back any strange behaviors or quality concerns within the data after getting user input specifying the type of data such as:

  • Flat values for an extended period of time (i.e. variance for last N time-series records dropping to 0 suddenly)
  • Sudden jumping of data (Value cliff-dropping to 0, and jumping back up to normal, or extremely high rate of change)
  • And so on...

Example (Good):

Blockquote

(Bad - Dropping to 0):

enter image description here

(Bad - Flat/constant value when non-constant is expected)

enter image description here

If such a library already exists, I would appreciate if someone could refer me the name so I can avoid "re-inventing the wheel" and see what other analysis methods there might be that I have not thought of to check for.

Ben C Wang
  • 617
  • 10
  • 19
  • 2
    I think the term I’ve heard for this is “anomaly detection.” I don’t know if there are any libraries that are focused on that or do that mostly automatically, but I think `IsolationForest` from sklearn might be useful. – fakedad Jul 10 '23 at 16:40
  • 1
    Thanks, I think that is enough for me to go off of and do some more research, I'll share my results if I find anything useful after I get out of this rabbit hole! – Ben C Wang Jul 10 '23 at 17:34
  • Hey! Not sure if you found the solution you're looking for but I created a library in Python here: https://github.com/SuperiorityComplex/data_checks that basically lets you write and deploy these types of checks with just Python code Anomaly Detection Example: https://github.com/SuperiorityComplex/data_checks/blob/main/examples/general/anomaly_detection/checks/anomaly_detection_check.py – josh Aug 30 '23 at 17:29

0 Answers0