-2

Which methods are best for managing and predicting and labeling data in dynamic environment? The system data distribution changes and it is not static. The system can have different normal settings and under different settings, we have different normal data distributions. Consider we have two classes. Normal and abnormal. What happens? We cannot say that we can rely on historical data and train a simple classification method to predict future observations since one day after training the model, data distribution can change and old observations will become irrelevant to new ones. Consider the following figure:

Related Figure

Blue distribution and red distribution are normal data but under different setting and in the training time we have just one setting. This data is for one sensor. So, suppose we train a model with blue one and also have some abnormal samples. Imagine abnormals samples as normal samples with a little bit noise or fault in measurements. Then, we want to test the model but setting changes and now we have red distribution as our test observations. So, the model misclassifies the samples.

What are the best methods for a situation like this? Please note that I have tried several clustering algorithms but they cannot manage and distinguish between normal and abnormal samples.

Any suggestion and help are highly welcomed. Thanks

Community
  • 1
  • 1
Arkan
  • 107
  • 5

1 Answers1

0

There are plenty of books on time series data.

In particular, on change detection. Your example can supposedly be considered a change in mean. There are statistical models to detect this.

Basseville, Michèle, and Igor V. Nikiforov. Detection of abrupt changes: theory and application. Vol. 104. Englewood Cliffs: Prentice Hall, 1993.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Thanks for your answer. Do you know any useful academic or white paper on this topic? Or could you, please, name some of the models? Actually, I don't want to just detect change point. I want to be able to distinguish between normal samples and abnormal ones after that change point. Thanks. – Arkan Jun 24 '18 at 10:09
  • Yes. I saw that. Should I read the whole book of you recommend a specific part? I thought that maybe there is a paper which is summary of that book. Thanks. – Arkan Jun 24 '18 at 15:15
  • I don't know which parts will be most relevant for you. I guess the entire book is relevant, and you have to decide yourself which approaches to try. It's also a good starting point for finding additional literature. – Has QUIT--Anony-Mousse Jun 24 '18 at 17:17