I'm trying to implement stochastic gradient descent in MATLAB, but I'm going wrong somewhere. I think that maybe the way I am checking for convergence is incorrect (I wasn't quite sure how to update the estimator with each iteration), but I'm not sure. I've been trying just to fit basic linear data, but I'm getting results that are pretty far off and I'm hoping to get some help. Would someone be able to point out where I'm going wrong, and why this isn't working correctly?
Thanks!
Here is the data set up and general code:
clear all;
close all;
clc
N_features = 2;
d = 100;
m = 100;
X_train = 10*rand(d,1);
X_test = 10*rand(d,1);
X_train = [ones(d,1) X_train];
X_test = [ones(d,1) X_test];
y_train = 5 + X_train(:,2) + 0.5*randn(d,1);
y_test = 5 + X_test(:,2) + 0.5*randn(d,1);
gamma = 0.01; %learning rate
[sgd_est_train,sgd_est_test,SSE_train,SSE_test,w] = stoch_grad(d,m,N_features,X_train,y_train,X_test,y_test,gamma);
figure(1)
plot(X_train(:,2),sgd_est_train,'ro',X_train(:,2),y_train,'go')
figure(2)
plot(X_test(:,2),sgd_est_test,'bo',X_test(:,2),y_test,'go')
and the function that actually implements the SGD is:
% stochastic gradient descent
function [sgd_est_train,sgd_est_test,SSE_train,SSE_test,w] = stoch_grad(d,m,N_features,X_train,y_train,X_test,y_test,gamma)
epsilon = 0.01; %convergence criterion
max_iter = 10000;
w0 = zeros(N_features,1); %initial guess
w = zeros(N_features,1); %for convenience
x = zeros(d,1);
z = zeros(d,1);
for jj=1:max_iter;
for kk=1:d;
x = X_train(kk,:)';
z = gamma*((w0'*x-y_train(kk))*x);
w = w0 - z;
end
if norm(w0-w,2)<epsilon
break;
else
w0 = w;
end
end
sgd_est_test = zeros(m,1);
sgd_est_train = zeros(d,1);
for ll=1:m;
sgd_est_test(ll,1) = w'*X_test(ll,:)';
end
for ii=1:d;
sgd_est_train(ii,1) = w'*X_train(ii,:)';
end
SSE_test = sum((sgd_est_test - y_test).^2);
SSE_train = sum((sgd_est_train - y_train).^2);
end