-2

Normally machine learning systems perform well. However when there is a problem with the trained machine learning system (for example the machine learning system performs worse than random ...) this great "guessing game" begins. With "guessing game", I allude to my experience. For me it seems, that debugging machine learning systems is most often done by guessing the problem rather than in a methodological way.

And since there are numerous reasons why a machine learning system may fail, finding the actual bug can be pretty time consuming. For example the bug may be due to:

  • biased training dataset
  • insufficient training data
  • datasets containing errors
  • unrepresentative/too many features
  • sloppy training (for example in neuronal-networks, when the training data is not presented randomly)
  • ...

Is there a machine learning system that is easy to debug? (And how can it be debugged?)

Is there a known methodical way of debugging machine learning systems at all?

lmjohns3
  • 7,422
  • 5
  • 36
  • 56
quant
  • 2,184
  • 2
  • 19
  • 29

1 Answers1

0

What you refer to as "debugging" is known as optimizing in the machine learning community. While there are certain ways to optimize a classifier depending on the classifier and the problem, there is no standard way for this. For example, in a text classification problem you might find out through experiments that if you train your classifier with certain features, the performance of your classifier would be enhanced. There are methods for selecting feature combinations that would result in highest classification accuracy of a classifier. Some of these methods involve using a genetic algorithm to find the best feature combinations. One method that you can learn about is sequential feature selection. There are also many papers on such topics that you might find useful. Additionally, there are studies that change the classification function or other computations in a classifier implementation to achieve better classification results.

Having said that, there are also some ways to optimize a classifier that are considered as cheating and should be avoided (which is usually when a classifier is optimized to only solve a problem only on a single dataset or highly similar datasets and not on other previously unseen datasets).

user823743
  • 2,152
  • 3
  • 21
  • 31