I'm struggling to make the gradient descent function I already have into one for stochastic gradient descent. I have the following:
gd <- function(f, grad, y, X, theta0, npars, ndata, a, niters) {
theta <- matrix(data=NA, nrow=niters, ncol=npars)
cost <- vector(mode="numeric", length=niters)
theta[1, ] <- theta0
cost[1] <- f(y, X, theta0, ndata)
for (i in 2:niters) {
theta[i, ] <- theta[i-1, ]-a*grad(y, X, theta[i-1, ], ndata)
cost[i] <- f(y, X, theta[i, ], ndata)
}
return(list(theta=theta, cost=cost))
}
This code works fine.
I'm trying to do it so instead of ndata <- 1000
, I have 100 points randomly sampled. I tried changing the second part to
for (i in 2:niters) {
samp <- sample(ndata, nsubsamples)
theta[i, ] <- theta[i-1, ]-a*grad(y[samp,], X[samp,], theta[i-1, ], nsubsamples)
cost[i] <- f(y, X, theta[i, ], nsubsamples)
but i get an error saying:
Error in y[samp, ] : incorrect number of dimensions.
My y
is a column from a dataset called simulated_data
with 1000 observations. When trying to get 100 random samples from it (nsubsamples=100 and ndata=1000, simulated_data[samp,]$y
works but simulated_data$y[samp,]
does not. But my y
has to be defined as simulated_data$y
.
So I'm wondering if there's an easier way to add a random sample and, if I've done that, the rest of my code is correct (as I'm a bit confused if I should be using ndata
or nsubsamples
for theta[i]
and cost[i]
.