LBFGS never converges in large dimensions in pytorch

Question

I am playing with Rule 110 of Wolfram cellular automata. Given line of zeroes and ones, you can calculate next line with these rules:

Starting with 00000000....1 in the end you get this sequence:

Just of curiosity I decided to approximate these rules with a polynomial, so that cells could be not only 0 and 1, but also gray color in between:

def triangle(x,y,z,v0):
    v=(y + y * y + y * y * y - 3. * (1. + x) * y * z + z * (1. + z + z * z)) / 3.
    return (v-v0)*(v-v0)

so if x,y,z and v0 matches any of these rules from the table, it will return 0, and positive nonzero value otherwise.

Next I've added all possible groups of 4 neighbors into single sum, which will be zero for integer solutions:

def eval():
    s = 0.
    for i in range(W - 1):
        for j in range(1, W + 1):
            xx = x[i, (j - 1) % W]
            yy = x[i, j % W]
            zz = x[i, (j + 1) % W]
            r = x[i + 1, j % W]
            s += triangle(xx, yy, zz, r)
    for j in range(W - 1): s += x[0, j] * x[0, j]
    s += (1 - x[0, W - 1]) * (1 - x[0, W - 1])
    return torch.sqrt(s)

Also in the bottom of this function I add ordinary conditions for first line, so that all elements are 0 except last one, which is 1. Finally I've decided to minimize this sum of squares on W*W matrix with pytorch:

x = Variable(torch.DoubleTensor(W,W).zero_(), requires_grad=True)
opt = torch.optim.LBFGS([x],lr=.1)
for i in range(15500):
    def closure():
        opt.zero_grad()
        s=eval()
        s.backward()
        return s
    opt.step(closure)

Here is full code, you can try it yourself. The problem is that for 10*10 it converges to correct solution in ~20 steps:

But if I take 15*15 board, it never finishes convergence:

The graph on the right shows how sum of squares is changing with each next iteration and you can see that it never reaches zero. My question is why this happens and how can I fix this. Tried different pytorch optimisers, but all of them perform worse then LBFGS. Tried different learning rates. Any ideas why this happen and how I can reach final point during optimisation?

UPD: improved convergence graph, log of SOS:

UPD2: I also tried doing same in C++ with dlib, and I don't have any convergency issues there, it goes much deeper in much less time:

I am using this code for optimisation in C++:

find_min_using_approximate_derivatives(bfgs_search_strategy(),
        objective_delta_stop_strategy(1e-87),
        s, x, -1)

This is not a convex problem, so you don't have any convergence guarantee with L-BFGS. — Yngve Moe, May 31 '18 at 10:30

Yngve Moe · Answer 1 · 2018-05-31T11:35:06.430

What's you're trying to do here is non convex optimisation and this is a notoriously difficult problem. Once you think about it, it make sense because just about any practical mathematical problem can be formulated as an optimisation problem.

1. Prelude
So, before giving you hints as to where to find a solution to your particular problem, I want to illustrate why certain optimisation problems are easy to solve.

I'm going to start by discussing convex problems. These are easy to solve even in the constrained case, and the reason for this is that when you compute the gradient you actually get a lot of information of where the minimum cannot be (the Taylor expansion of a convex function, f, is always an underestimate of f), additionally there is only one minimum and no sadle points. If you're interested in learning more about convex optimisation I recommend seeing Stephen Boyd's class in convex optimisation on YouTube

Now then if non convex optimization is so difficult, how come we are able to solve it in deep learning? The answer is simply that the non convex function we are minimising in deep learning it's quite nice as demonstrated by Henaff et al.

It is therefore important that machine learning practitioners realise that the operation procedures used in deep learning will most likely not yield a good minimum, if they converge to a minimum in the first place, on other non-convex problems.

2. Answer to your question
Now then to answer your problem, You're not you probably not gonna find and fast solution as nonconvex optimisation is NP complete. But fear not, SciPy has a few global optimisation algorithms to choose from. Here is a link to another stack overflow thread with a good answer to your question.

3. Moral of the story
Finally, I want to remind you that convergence guarantees are important, forgetting it has led to an oil rig collapsing.

PS. Please forgive typos, I'm usong my phone for this

Update: As to why BFGS works with dlib, there might be two reasons, firstly, BFGS is better at using curvature information than L-BFGS, and secondly it uses a line search to find an optimal step size. I'd recommend checking if PyTorch allow line searches and if not, setting an decreasing step size (or just a really low one).

It is implemented in SciPy, you should get it working if you rewrite your code using Numpy :) — Yngve Moe, May 31 '18 at 12:20

LBFGS never converges in large dimensions in pytorch

1 Answers1