General guild-lines on setting the parameter of an algorithm when training data is given

Question

Suppose the algorithm has two parameters:

para_a,it is an integer between 10 and 30
para_b, it is an float between 0 and 1

The output of the algorithm is between 0 and 1, and the higher the output is the better parameter it has been set. Now my question is how we can set parameters of the algorithm if some testing data is given. My solution is as follows:

Step 1: choose para_a from 10 to 30, and choose para_b from 0 to 1 with an interval of 0.1.
Step 2: for each set parameter, calculate the output of the algorithm with the help of the given testing data.
Step 3: choose the parameter that has lead to the highest output value.

This is an intuitive solution to parameter setting and we cannot grantee the best parameter with the solution. Are there more elegant solutions?

Why do you say that *we cannot grantee the best parameter with the solution*? You search seems exaustive, unless the algorihtm is too sensistive to para_b (which is usually bad news for the algorithm's design) — A.S.H, Sep 28 '15 at 11:44

score 0 · Answer 1 · edited May 23 '17 at 11:51

The question is very broad and a bit vague, but I think some general hints can be given here:

The problem, as you described it, can be considered as a (comparatively simple) Optimization Problem. There are many different forms of these problems. E.g. these problems can be classified as being linear or nonlinear, continuous or discrete, bounded or unbounded, etc. The best process of solving such a problem depends on this general classification, and beyond that, which additional information is available - for example, whether derivatives of the objective function can be computed.

From the description given so far, one has to assume that the problem is a "black box": You have no idea which input will result in which output. But there is probably at least some additional information available. For example, the objective function as you described it might be continuous: When you have inputs and outputs like

[10, 0.4999] -> 0.6999
[10, 0.5000] -> ???
[10, 0.5001] -> 0.7001

then (for most real world problems), the output for the second case is "likely" to be close to 0.7. But this is an assumption that has to be validated.

If you really have no additional information about the objective function, then you can hardly do more than a "brute force search" as you already described it: You can systematically try all values between 10 and 30 for the first parameter, and a sample of the values between 0.0 and 1.0 for the second parameter.

(Of course, this sampling may actually miss the "best" value. But note that nearly all approaches for avoiding this, or for "more clever searches" in general, will at least implicitly compute the derivatives of the objective function...)

Depending on whether you can classify the problem more precisely, there are several optimizers available in the Apache Commons Math "Optim" package. I recently posted an example here.

General guild-lines on setting the parameter of an algorithm when training data is given

1 Answers1