0

I have a function F, [bool] = F(DATASET, tresh1, tresh2), that take input a DATASET and some parameters, for exasemple 2 treshold value -tresh1 e tresh2-, and returns a boolean: 1 if DATASET is "good", 0 otherwise. The answer depends on the values tresh1 e tresh2 of course.

Suppose I have 100 DATASETs avaiable and I know which ones are good and which are not. I would like to "train" my function F, i.e. teach it a couple of value tresh1_ and tresh2_ such that F(DATASET, tresh1_, tresh2_) returns "true" for all (or most of) DATASETs "good" and "false" otherwize.

I expect that F(DATASET_, tresh1_, tresh2_), where DATASET_ is a new one (different from the previous 100), return me true if DATASET_ is really "good".

I could see that problem as a clustering problem: for every DATASET in the training set I choose random tresh1 and tresh2 and I mark which values makes sure that F returns correct value and which not. Hence I select a region where tresh1 and tresh2 values are "good". Is that a good method? Are there better ones?

In general, it seems to me a "parameters calibration problem". Does exist some classic tecniques to solve it?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Giorgio
  • 20
  • 4

1 Answers1

0

What you want to do is commonly known as

Hyperparameter optimization

See the Wikipedia article for details. The common approach is to perform a grid search, unless you can compute the derivatives of your function F.

This is a search method; it is commonly used in machine learning to optimize the performance, but it is not a "machine learning" algorithm itself.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • thanks! a releted key-concept is SVD (support vector machines), very interesting. – Giorgio Sep 24 '15 at 09:13
  • 1
    No, that SVMs are not a related key concept, but a **use case**. SVMs (not SVD) have multiple parameters that need to be chosen for good performance... SVD (singular value decomposition) is something else. – Has QUIT--Anony-Mousse Sep 24 '15 at 09:22