2

This network contains an input layer and an output layer, with no nonlinearities. The output is just a linear combination of the input.I am using a regression loss to train the network. I generated some random 1D test data according to a simple linear function, with Gaussian noise added. The problem is that the loss function doesn't converge to zero.

import numpy as np
import matplotlib.pyplot as plt

n = 100
alp = 1e-4
a0 = np.random.randn(100,1) # Also x
y = 7*a0+3+np.random.normal(0,1,(100,1))

w = np.random.randn(100,100)*0.01
b = np.random.randn(100,1)

def compute_loss(a1,y,w,b):
       return np.sum(np.power(y-w*a1-b,2))/2/n

def gradient_step(w,b,a1,y):

    w -= (alp/n)*np.dot((a1-y),a1.transpose())
    b -= (alp/n)*(a1-y)  
    return w,b

loss_vec = []
num_iterations = 10000

for i in range(num_iterations):

    a1 = np.dot(w,a0)+b
    loss_vec.append(compute_loss(a1,y,w,b))
    w,b = gradient_step(w,b,a1,y)
plt.plot(loss_vec)
srkdb
  • 775
  • 3
  • 15
  • 28

2 Answers2

1

The convergence also depends on the value of alpha you use. I played with your code a bit and for

alp = 5e-3

I get the following convergence plotted on a logarithmic x-axis

plt.semilogx(loss_vec)

Output

enter image description here

Sheldore
  • 37,862
  • 7
  • 57
  • 71
  • I admit, but you will have to play around with alpha values. What you can do is to generate a range of alpha values and put your code in a function and call it for different alpha values and plot the convergence to see what is the optimum value. – Sheldore Sep 16 '18 at 16:28
  • Yes, looks good to me. Nothing much to shorten further. Try generating alpha = 0.1, 0.5, 0.01, 0.05, 0.001, 0.005, 0.0001 and so on. – Sheldore Sep 16 '18 at 16:32
  • Alright. Also, I think it's not reasonable to expect near zero loss without any hidden layers, right? – srkdb Sep 16 '18 at 16:33
  • Hmm, I can't comment on that due to my almost negligible knowledge of NN. My comment about alpha was based on other minimization problems I have worked on. – Sheldore Sep 16 '18 at 16:34
0

If I understand your code correctly, you only have one weight matrix and one bias vector despite the fact that you have 2 layers. This is odd and might be at least part of your problem.

JacKeown
  • 2,780
  • 7
  • 26
  • 34
  • Yes, one weight matrix and one bias vector. There's no hidden layer. Just the input and output layers. What strikes as still odd? – srkdb Sep 16 '18 at 16:29
  • You have one weight matrix, w, but your compute loss function is wonky then. you have `np.sum(np.power(y-w*a1-b,2))/2/n` where you should have `np.sum(np.power(y-a1,2))/2/n` – JacKeown Sep 19 '18 at 14:28