1

I was tasked with a problem by a colleague and I am having a hard time coming up with possible solutions. The problem is: I have a dataset, where every row represents one piece of product that we make here, and columns that represent values of many different factors which occur during the production of it (for example length, weight, temperature etc. ). Now, sometimes, a certain product will occur that is heavily defected and cannot be sold to the customer. Since we don’t know why these defects occur, we want to look at this dataset and using machine learning algorithms in R find out if there is anything different or unusual about products with defect (for example a temperature that is way above average and so on).

I guess what I’m asking is, if there is some type of method, algorithm or study anybody can point me to so I can gain more info about this. Thank you very much for any help!

1 Answers1

1

There are many different methods that might suite your needs. For example, if you have the defect examples labelled accordingly you can try a simple binary classification using a standard machine learning algorithm (SVM, Naive Bayes, Random Forest etc.).

Although in your case it seems that anomaly detection algorithms might be more idicated. The ideia here is to train a classifier to detect one class of examples (the "normal" class) and everything it can't detect might be a anomaly or a defect in a product in your case. You can take a look at one-class classification using SVM implemented in the caret (take a look at similar questions such as One-class classification with SVM in R ). Another algorithm you can try is an Autoencoder for anomaly detection (as described in Predicting Fraud with Autoencoders and Keras). This is assuming that the reconstruction error, in the autoencoder, of defect examples will be higher than the non-defect examples.

If I we're in your shoes I would try out these anomaly detection algorithms as they seem to fit your description of the problem.

Cheers :)

lsfischer
  • 344
  • 2
  • 14
  • 1
    One thing that you forgot is that the dataset will have data for multiple products. Is it not possible that something that is perfectly normal for one product could be classified as an anomaly for another product. It is possible that the dataset might not have this problem but what if it did. – secretive May 20 '19 at 20:58
  • Thank you very much for your answer! as @rajatkabra mentioned, is it possible that something that is normal for one product can be an anomaly for another? Anyway really thank you for the answer. I ll go and look in to it right away! :) – Martin Šenitka May 21 '19 at 07:12
  • @rajatkabra is right, that might happen and you should be aweare of it. In that you can either separate the products and train a classifier for each of them, or if you have your examples correctly labelled just use multiclass-classification. – lsfischer May 21 '19 at 13:26