5

I have just started to learn deep learning. I found myself stuck when it came to gradient descent. I know how to implement batch gradient descent. I know how it works as well how mini-batch and stochastic gradient descent works in theory. But really can't understand how to implement in code.

import numpy as np
X = np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ])
y = np.array([[0,1,1,0]]).T
alpha,hidden_dim = (0.5,4)
synapse_0 = 2*np.random.random((3,hidden_dim)) - 1
synapse_1 = 2*np.random.random((hidden_dim,1)) - 1
for j in xrange(60000):
    layer_1 = 1/(1+np.exp(-(np.dot(X,synapse_0))))
    layer_2 = 1/(1+np.exp(-(np.dot(layer_1,synapse_1))))
    layer_2_delta = (layer_2 - y)*(layer_2*(1-layer_2))
    layer_1_delta = layer_2_delta.dot(synapse_1.T) * (layer_1 * (1-layer_1))
    synapse_1 -= (alpha * layer_1.T.dot(layer_2_delta))
    synapse_0 -= (alpha * X.T.dot(layer_1_delta))

This is the sample code from ANDREW TRASK's blog. It's small and easy to understand. This code implements batch gradient descent but I would like to implement mini-batch and stochastic gradient descent in this sample. How could I do this? What I have to add/modify in this code in order to implement mini-batch and stochastic gradient descent respectively? Your help will help me a lot. Thanks in advance.( I know this sample code has few examples, whereas I need large dataset to split into mini-batches. But I would like to know how can I implement it)

savan77
  • 69
  • 1
  • 1
  • 5
  • Just sample a mini batch inside your for loop, thus change the name of original X to "wholeX" (and y as well) and inside the loop do X, y = sample(wholeX, wholeY, size)" where sample will be your function returning "size" number of random rows from wholeX, wholeY – lejlot Jul 02 '16 at 10:20
  • Thanks. As you said my function will return random rows, so isn't it possible it may return same rows multiple times? Would it cause a problem? and what if I put another for loop inside a for loop and iterate it n times. (n = mini-batches). Each time X,y will be different. Is it ok? If it is fine then how it actually improves optimization? – savan77 Jul 02 '16 at 14:30
  • random sampling without repeatition is a typical solution, and it is not hard to achieve given the fact numpy.random has this kind of sampling implemented. Another for loop is fine (although not efficient as python loops are slow). Improvement in the optimization comes from more mathematical reasons, way to long to express here. In short, it gives you many bad estimates of the gradient at a cost of one good, which makes the optimization faster – lejlot Jul 02 '16 at 14:46
  • Thanks. Heading over to more mathematical stuff. – savan77 Jul 02 '16 at 15:03

2 Answers2

13

This function returns the mini-batches given the inputs and targets:

def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
    assert inputs.shape[0] == targets.shape[0]
    if shuffle:
        indices = np.arange(inputs.shape[0])
        np.random.shuffle(indices)
    for start_idx in range(0, inputs.shape[0] - batchsize + 1, batchsize):
        if shuffle:
            excerpt = indices[start_idx:start_idx + batchsize]
        else:
            excerpt = slice(start_idx, start_idx + batchsize)
        yield inputs[excerpt], targets[excerpt]

and this tells you how to use that for training:

for n in xrange(n_epochs):
    for batch in iterate_minibatches(X, Y, batch_size, shuffle=True):
        x_batch, y_batch = batch
        l_train, acc_train = f_train(x_batch, y_batch)

    l_val, acc_val = f_val(Xt, Yt)
    logging.info('epoch ' + str(n) + ' ,train_loss ' + str(l_train) + ' ,acc ' + str(acc_train) + ' ,val_loss ' + str(l_val) + ' ,acc ' + str(acc_val))

Obviously you need to define the f_train, f_val and other functions yourself given the optimisation library (e.g. Lasagne, Keras) you are using.

Ash
  • 3,428
  • 1
  • 34
  • 44
6

The following function returns (yields) mini-batches. It is based on the function provided by Ash, but correctly handles the last minibatch.

def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
    assert inputs.shape[0] == targets.shape[0]
    if shuffle:
        indices = np.arange(inputs.shape[0])
        np.random.shuffle(indices)
    for start_idx in range(0, inputs.shape[0], batchsize):
        end_idx = min(start_idx + batchsize, inputs.shape[0])
        if shuffle:
            excerpt = indices[start_idx:end_idx]
        else:
            excerpt = slice(start_idx, end_idx)
        yield inputs[excerpt], targets[excerpt]
dsachar
  • 61
  • 1
  • 3