0

I am writing a hill climbing algorithm where I want the step size to be proportional to the local derivative/gradient at a specific point. However, the algorithm seems to get stuck in a trough that I can't really understand, for example given a starting point at (1.0, 1.0): (1.0, 1.0) -> (2.0, 0.0) -> (2.0, 3.5) -> (2.0, 3.8) -> (2.0, 5.5) -> (2.0 5.4)

My algorithm uses a generate function that I have tested, and it works perfectly fine. For context, this is the generate function I'm using:

import numpy as np


def generate_surrounding(pos: np.ndarray, delta: np.ndarray):
    dx, dy = delta
    x, y = pos

    dxs = [-dx, 0, dx]
    dys = [-dy, 0, dy]

    X, Y = np.meshgrid(dxs, dys)

    X = list(X.flatten())
    Y = list(Y.flatten())
    del X[len(X) // 2]
    del Y[len(Y) // 2]

    for x_dx, y_dy in zip(X, Y):
        if (x_dx + x) < 0.0 or (y_dy + y) < 0.0:
            continue
        else:
            yield np.array([x_dx + x, y_dy + y])

To create a closing condition, I put this generate statement inside a while loop. This is the block of code I believe is causing the issue, although I can't pin down where the problem is coming from. I have checked that all the variables are being assigned correctly, and it seems like something is going wrong inside the dictionary comprehension, but I'm not sure:

cost_function = lambda x, y: 2 * np.sin(y) + 2 * np.cos(x)
delta = np.array([1.0, 1.0])
pos = np.array([1.0, 1.0])
cost = cost_function(*pos)

while abs(delta[0]) > 0.01 or abs(delta[1]) > 0.01:
    costs_dict = {cost_function(*coord): coord for coord in generate_surrounding(pos, delta)}
    new_cost = min(costs_dict.keys())
    new_pos = costs_dict[new_cost]
    d_cost = new_cost - cost
    d_pos_x, d_pos_y = new_pos[0] - pos[0], new_pos[1] - pos[1]
    delta[0] = abs(d_cost / d_pos_x) if not d_pos_x == 0 else delta[0]
    delta[1] = abs(d_cost / d_pos_y) if not d_pos_y == 0 else delta[1]
    pos = new_pos
    cost = new_cost

Finally, here's the code I'm using to debug and visualize the cost function. This is just a simple cost function I made up; I imagine the actual cost function I use will be far more complicated:

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(1, 30, 0.25)
y = x.reshape(-1, 1)
h = 2*np.sin(y)+2*np.cos(x)

cs = plt.contourf(h)
cs.changed()
plt.colorbar()

Thanks for any help :)

  • Can you provide a minimal example where you call a function and it returns the optimum at x=2.0, y=?? There are a few reasons why this might happen, from a bug, to there being a local optimum at x =2 and you needing a different optimizer. – Tom McLean Mar 27 '23 at 11:32
  • This is the minimal reproducible example, if you run this code it will never reach a value but this is the terminal output when i add ```print ("delta: ", delta, " pos: ", pos)``` ```delta: [1. 1.] pos: [1. 1.] delta: [3.59584025 3.59584025] pos: [2. 0.] delta: [3.59584025 0.24405219] pos: [2. 3.59584025] delta: [3.59584025 1.67283913] pos: [2. 3.83989244] delta: [3.59584025 0.06401637] pos: [2. 5.51273157] delta: [3.59584025 1.38963974] pos: [2. 5.4487152] ect... ``` – Lyndon Alcock Mar 27 '23 at 12:59
  • I think it's rather a problem with your optimization algorithm and the step size. Your just taking the derivative without any scaling and there's no reason why this should converge to the minimum. As an example take `x^2` as minimization function and suppose you start at some value `x=a` (`y` does not matter in this case). The derivative is `2a`, so `-a` and `3a` are selected as candidates for the next iteration, from which `-a` will be selected, since it has lower cost. Then the next will be `a`, then `-a`, and so on... – Elmar Zander Mar 28 '23 at 12:48

1 Answers1

0

Turns out what I was looking for was caled a gradient descent function, this is what it looks like, I also found a better function that only has one minimum, the following function gets a rough ballpark figure of x and y, which is good enough for what I'm looking for :)

import numpy as np

f = lambda x, y: -10*np.exp(-((x-1.3)**2 + (y-1.7)**2))

x = 0.0
y = 0.0
alpha = -0.1
x_diff = .1
y_diff = .1
n_iter = 10

while n_iter:

    n_x = y_diff * f(x, y) - y_diff * f(x + x_diff, y)
    n_y = x_diff * f(x, y) - x_diff * f(x, y + y_diff)
    n_z = x_diff * y_diff


    n_x, n_y, n_z= [n_x, n_y, n_z] / np.linalg.norm([n_x, n_y, n_z])

    dzdx = -n_x/n_z
    dzdy = -n_y/n_z

    #print(f"x: {x:.4}, y: {y:.4}, dzdx: {dzdx:.4}, dzdy: {dzdy:.4}")

    x_step = dzdx * alpha 
    y_step = dzdy * alpha
    x += x_step
    y += y_step
    n_iter -= 1
print(x,y, f(x,y))