0

I'm trying to implement stochastic gradient descent in MATLAB, but I'm going wrong somewhere. I think that maybe the way I am checking for convergence is incorrect (I wasn't quite sure how to update the estimator with each iteration), but I'm not sure. I've been trying just to fit basic linear data, but I'm getting results that are pretty far off and I'm hoping to get some help. Would someone be able to point out where I'm going wrong, and why this isn't working correctly?

Thanks!

Here is the data set up and general code:

clear all;
close all;
clc

N_features = 2;
d = 100;
m = 100;

X_train = 10*rand(d,1);
X_test = 10*rand(d,1);
X_train = [ones(d,1) X_train];
X_test = [ones(d,1) X_test];

y_train = 5 + X_train(:,2) + 0.5*randn(d,1);
y_test = 5 + X_test(:,2) + 0.5*randn(d,1);

gamma = 0.01; %learning rate

[sgd_est_train,sgd_est_test,SSE_train,SSE_test,w] = stoch_grad(d,m,N_features,X_train,y_train,X_test,y_test,gamma);

figure(1)
plot(X_train(:,2),sgd_est_train,'ro',X_train(:,2),y_train,'go')

figure(2)
plot(X_test(:,2),sgd_est_test,'bo',X_test(:,2),y_test,'go')

and the function that actually implements the SGD is:

% stochastic gradient descent

function [sgd_est_train,sgd_est_test,SSE_train,SSE_test,w] = stoch_grad(d,m,N_features,X_train,y_train,X_test,y_test,gamma)

    epsilon = 0.01; %convergence criterion
    max_iter = 10000;

    w0 = zeros(N_features,1); %initial guess
    w = zeros(N_features,1); %for convenience

    x = zeros(d,1);
    z = zeros(d,1);

    for jj=1:max_iter;
        for kk=1:d;
            x = X_train(kk,:)';
            z = gamma*((w0'*x-y_train(kk))*x);
            w = w0 - z;
        end

        if norm(w0-w,2)<epsilon
            break;
        else
            w0 = w;
        end
    end

    sgd_est_test = zeros(m,1);
    sgd_est_train = zeros(d,1);

    for ll=1:m;
        sgd_est_test(ll,1) = w'*X_test(ll,:)';
    end

    for ii=1:d;
        sgd_est_train(ii,1) = w'*X_train(ii,:)';
    end

    SSE_test = sum((sgd_est_test - y_test).^2);
    SSE_train = sum((sgd_est_train - y_train).^2);

end
poppy3345
  • 101
  • 4
  • Can you please describe the variables? What do they mean? I may be bale to help, but can't unless I know exactly how are you doing things. – Ander Biguri Oct 10 '16 at 08:19
  • Exaple: What is your update equation? can you write the gradient for me in maths form? – Ander Biguri Oct 10 '16 at 08:27
  • I tested it a bit: I am 98% confindent that your update of `w` is wrong, i.e. the `z=....` line. if you remove noise from the data and give the exact solution it works. If you change a bit `w0(2)` then it will find the right value, but if you change `w0(1)`, then it will not converge. – Ander Biguri Oct 10 '16 at 09:35
  • Hi @AnderBiguri, sorry about that, I am pretty new to this site and wasn't sure exactly what to write. The update equation that I'm using is: `w = w_0-\gamma (w^Tx_i - y_i)x_i`, where `w` is the update of the estimator, `w_0` is the previous guess for the estimator, `\gamma` is the learning rate, `x_i` is `i`th sample of the feature vector, and `y_i` is the `i`th sample of the function you want to learn. – poppy3345 Oct 10 '16 at 16:11

1 Answers1

0

I tried lowering the learning rate at 0.001 and resulted to this: Imgur
Which tells me that your algorithm produces an estimation of the form y=ax instead of y=ax + b (for some reason ignores the constant term) and also you need to lower the learning rate in order to converge.

CodeBlackj
  • 123
  • 8
  • I have also tried taking out the y-intercept, and you're right - it doesn't get the y-intercept value at all! But I've also run it a bunch of times with a lower learning rate, and even though it seems to help in some cases, it still produces weird estimates in others. – poppy3345 Oct 10 '16 at 16:24