Projection onto unit simplex using gradient decent in Pytorch

Question

In Professor Boyd homework solution for projection onto the unit simplex, he winds up with the following equation:

g_of_nu = (1/2)*torch.norm(-relu(-(x-nu)))**2 + nu*(torch.sum(x) -1) - x.size()[0]*nu**2

If one calculates nu*, then the projection to unit simplex would be y*=relu(x-nu*1).

What he suggests is to find the maximizer of g_of_nu. Since g_of_nu is strictly concave, I multiply it by a negative sign (f_of_nu) and find its global minimizer using gradient descent.

Question

My final vector y*, does not add up to one, what am I doing wrong?

Code for replication

torch.manual_seed(1)
x = torch.randn(10)#.view(-1, 1)
x_list = x.tolist()
print(list(map(lambda x: round(x, 4), x_list)))
nu_0 = torch.tensor(0., requires_grad = True)
nu = nu_0
optimizer = torch.optim.SGD([nu], lr=1e-1)

nu_old = torch.tensor(float('inf'))
steps = 100
eps = 1e-6
i = 1
while torch.norm(nu_old-nu) > eps:
  nu_old = nu.clone()
  optimizer.zero_grad()
  f_of_nu = -( (1/2)*torch.norm(-relu(-(x-nu)))**2 + nu*(torch.sum(x) -1) - x.size()[0]*nu**2 )
  f_of_nu.backward()
  optimizer.step()
  print(f'At step {i+1:2} the function value is {f_of_nu.item(): 1.4f} and nu={nu: 0.4f}' )
  i += 1
y_star = relu(x-nu).cpu().detach()
print(list(map(lambda x: round(x, 4), y_star.tolist())))
print(y_star.sum())

[0.6614, 0.2669, 0.0617, 0.6213, -0.4519, -0.1661, -1.5228, 0.3817, -1.0276, -0.5631]
At step  1 the function value is -1.9618 and nu= 0.0993
.
.
.
At step 14 the function value is -1.9947 and nu= 0.0665
[0.5948, 0.2004, 0.0, 0.5548, 0.0, 0.0, 0.0, 0.3152, 0.0, 0.0]
tensor(1.6652)

The function

torch.manual_seed(1)
x = torch.randn(10)
nu = torch.linspace(-1, 1, steps=10000)

f = lambda x, nu: -( (1/2)*torch.norm(-relu(-(x-nu)))**2 + nu*(torch.sum(x) -1) - x.size()[0]*nu**2 )

f_value_list = np.asarray( [f(x, i) for i in nu.tolist()] )

i_min = np.argmin(f_value_list)
print(nu[i_min])

fig, ax = plt.subplots()

ax.plot(nu.cpu().detach().numpy(), f_value_list);

Here is the minimizer from the graph which is consistent with the gradient descent.

tensor(0.0665)

jylls · Accepted Answer · 2022-12-21T22:13:59.370

The error comes from the derivation of the formula:

from:

If you develop the expression

you will realize that it should be

instead of

In short, this error comes from forgetting the 1/2 factor while developing the norm. Once you make that change everything works as intended:

import torch
import torchvision
import numpy as np
import matplotlib.pyplot as plt

torch.manual_seed(1)
x = torch.randn(10)

x_list = x.tolist()

nu_0 = torch.tensor(0., requires_grad = True)
nu = nu_0
optimizer = torch.optim.SGD([nu], lr=1e-1)

nu_old = torch.tensor(float('inf'))
steps = 100
eps = 1e-6
i = 1
while torch.norm(nu_old-nu) > eps:
  nu_old = nu.clone()
  optimizer.zero_grad()
  f_of_nu = -(0.5*torch.norm(-torch.relu(-(x-nu)))**2 + nu*(torch.sum(x) -1) -0.5*x.size()[0]*nu**2)
  f_of_nu.backward()
  optimizer.step()
  print(f'At step {i+1:2} the function value is {f_of_nu.item(): 1.4f} and nu={nu: 0.4f}' )
  i += 1

y_star = torch.relu((x-nu)).cpu().detach()
print(y_star)
print(list(map(lambda x: round(x, 4), y_star.tolist())))
print(y_star.sum())

And the output gives:

...
At step 25 the function value is -2.0721 and nu= 0.2328
tensor(0.2328, requires_grad=True)
tensor([0.4285, 0.0341, 0.0000, 0.3885, 0.0000, 0.0000, 0.0000, 0.1489, 0.0000,
        0.0000])
[0.4285, 0.0341, 0.0, 0.3885, 0.0, 0.0, 0.0, 0.1489, 0.0, 0.0]
tensor(1.0000)

Thank you so much. I did not think Professor Boyd may has a typo:-). I am interested in neuroscience and would like to learn about it, but I could not find any contact info on your profile. I am a PhD student in applied mathematics. If you would like, we can meet and exchange some ideas. — Saeed, Dec 22 '22 at 05:02
Yep, typos like that can be hard to spot and can happen at all levels. Happy to chat/answer questions anytime. I haven't really used the SO chat before but that might be a good place to start. — jylls, Dec 22 '22 at 15:13
I have not use it either and had not heard of that before. We can arrange a meeting on Google meet or whatever platform that works best for you. The only thing is that I do not know how to share my contact information without posting it publicly. — Saeed, Dec 22 '22 at 17:34

Projection onto unit simplex using gradient decent in Pytorch

1 Answers1