Minimising a slow, noisy, not well-defined target function

Question

My question is: are there minimisation algorithms, preferably implemented in Python, that can operate on a function that is slow (~1-10s) and is taking data from a live system, that would not take more than a couple of hours to complete?

I have an FPGA that runs a filter over some sensor data, and uses the output of this filter to improve the performance of another device. I would like to find the optimal filter. My attempts at modelling the system and using various signal processing techniques did not produce adequate results, so now I'm going to attempt to solve this on the live system itself (if anything, just to prove that such an optimal filter is possible).

The filter can be programmed over the serial line, and the performance of the other device can be measured over the serial line.

So I can construct a function which:

Takes parameters that define a filter
Programs the filter via the serial line
Acquires data via the serial line
Computes a measure of how good the filter is (in the sense that smaller is better)

This means I have a function that can be used as a target for minimisation. Here are the problems though:

It's slow

To program the filter takes about 1.5s, to acquire the data to measure the goodness of the filter takes about 6s. All up, that's nearly 8s per function call. In other words, calling it just 500 times would take more than an hour. Even speeding up the communications and computation would probably not change this by an order of magnitude.

It's not well defined

(Note that x below is a vector in the parameter space of my target function.)

To put it simply, x1 == x2 does not imply f(x1) == f(x2). Due to the noisiness of the system, sampling the target function f(x) at the same point in its parameter space could yield different results due to the noisiness of the system.

The first thing that occurred to me was to have the target function actually average several measurements, and increase the tolerance value of whatever minimisation routine I'm running. But in looking at the actual numbers, in the worst case I could have the (mean) value of f(x) change by 2.0 over the full range of parameters, but the sample standard deviation is 1.6. This means that if I want to reduce the standard error (s/sqrt(n)) to, say, 0.1 I'd need to measure the same point 250 times, which makes each measurement take 30 minutes. Yay.

There are tricks I can pull to improve this, say to get a swing of ~20 over the parameter range with a standard deviation of 0.25 at any given point. But these tricks have other tradeoffs in time.

Concessions

On the bright side, plotting the function (greatly averaged) over the whole optimisation space (which I've done to confirm that there is indeed a global minimum) shows that the thing is actually reasonably smooth and the minimum value is not too sharp. The other bright side is that the metric only needs to be optimised to two or three significant figures. If it were not so slow, optimising it would be easy.

I've started looking at the minimisation routines in SciPy, but since many of the parameters are undocumented or interdependent, it's a bit of a walk in the dark (with each step taking several hours).

It strikes me that what I really need is an optimisation algorithm that is known to work in the least number of function calls; although maybe there's another approach that I haven't considered.

Interesting question. How many parameters do you need to optimise? — Sven Marnach, Dec 12 '11 at 01:43
Does "noisy" mean that the values can differ slightly form the "real" value or does it mean that there can be completely "nonsense" values? — sth, Dec 12 '11 at 01:45
An idea to at least make the experiments easier would be to substitute the target function by some simple function with similar features and artificial noise. You could do experiments in just a few seconds, and hope that what works best for the fake function also works best for the real thing. — Sven Marnach, Dec 12 '11 at 01:46
@SvenMarnach - At the moment I'm attempting to optimise the filter separately at each frequency band defined by the N-points of the filter, so I have two parameters to optimise over (phase and amplitude of a sinusoidal response). I'm currently trying to figure out whether I can use weighted basis functions over the whole frequency range instead, so that the weights are the parameters and I can do the whole thing at once. — detly, Dec 12 '11 at 01:48
@sth - no nonsense values, just significant fluctuations from the mean at each point. — detly, Dec 12 '11 at 01:49
@SvenMarnach - ah, that's a great idea! Works for testing particular optimisation routines for number of function calls, too. — detly, Dec 12 '11 at 01:50
BTW--This would have been a very reasonable question on the newly opened [beta site for scientific computation](http://scicomp.stackexchange.com/), which is not to say that it is off topic here. — dmckee --- ex-moderator kitten, Dec 12 '11 at 02:06

score 4 · Answer 1 · answered Jul 21 '17 at 17:58

The package scikit-optimize (skopt) is designed for exactly this setting: slow, noisy objective functions. It uses Gaussian processes to model the target function, and it switches between evaluating points that are uncertain (to improve the model) and points that are likely to be good. Their examples use ~100 evaluations to recover the minimum. There is even an interface aimed at physical experiments, where it proposes trial values, you run the experiment, you feed it the results, and it proposes more trial values.

score 2 · Accepted Answer · edited May 23 '17 at 12:03

2

I think that this is a reasonable use case for a Metropolis optimization. This is one of the early examples of a Markov Chain Monte Carlo and can be applied more or less unchanged to your use case.

On each step you propose a random step in your parameter space and define the fitness as exp(-(1/thing_to_minimize)). Accept any proposed step where the fitness has grown and others randomly at a fraction current_fitness/previous_fitness. After it's been running for a while simple start averaging the location in parameter space.

You can add a simulated annealing aspect by reducing the mean step size as a function of time for an extra frill.

I've written about this on Stack Overflow a few times before, but you'll find my most complete description on Software to Tune/Calibrate Properties for Heuristic Algorithms.

edited May 23 '17 at 12:03

Community

1
1

answered Dec 12 '11 at 02:14

dmckee --- ex-moderator kitten

98,632
24
142
234

I'm not sure how your fitness function incorporates the metric that I already have. Also, what are the constraints on `time-to-completion`? As in, would a simple decrementing counter work? – detly Dec 12 '11 at 05:17
Sorry, my mind must have wandered while I wrote that. The general point is to get a metric to *maximize*, and you get optimal convergence if it is exponential as well. – dmckee --- ex-moderator kitten Dec 12 '11 at 05:24

score 0 · Answer 3 · answered Dec 12 '11 at 06:13

This is not a solution, but rather a grab bag of things to consider. I'd use something like the following procedure: Approximate your function using N samples, pick a new point based on the approximation, and iterate: I've used similar techniques on noisy data with a large number of parameters. Heres a bit more detail

Approximate your function using N values (maybe weighted in some way). Some options for this are:
- RANSAC
- least squares approximation
- maximum likelyhood estimator
The one you choose would depend upon what you expect from the behaviour of the error.
Pick a new sample location based on the approximated function and shift that value into the N, throwing out one of the other N points. Again there are several ways to do this. It partially depends upon your choice of approximating function. Some options include:
- Just jump to the minimum of the approximated function.
- One step of steepest descent (slow but has good convergence properties)
- One step of Conjugate gradient ( better rate of convergence, but doesn't always converge).
There's a great many other options too.

How you throw out one of the N points is also up for debate. Options might be:
- One at random
- Oldest
- Furthest from newest sample
- Furthest from "best" point
- One which devates most from the approximate model.

Minimising a slow, noisy, not well-defined target function

It's slow

It's not well defined

Concessions

3 Answers3