How to implement mini-batch gradient descent in python?

Question

I have just started to learn deep learning. I found myself stuck when it came to gradient descent. I know how to implement batch gradient descent. I know how it works as well how mini-batch and stochastic gradient descent works in theory. But really can't understand how to implement in code.

import numpy as np
X = np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ])
y = np.array([[0,1,1,0]]).T
alpha,hidden_dim = (0.5,4)
synapse_0 = 2*np.random.random((3,hidden_dim)) - 1
synapse_1 = 2*np.random.random((hidden_dim,1)) - 1
for j in xrange(60000):
    layer_1 = 1/(1+np.exp(-(np.dot(X,synapse_0))))
    layer_2 = 1/(1+np.exp(-(np.dot(layer_1,synapse_1))))
    layer_2_delta = (layer_2 - y)*(layer_2*(1-layer_2))
    layer_1_delta = layer_2_delta.dot(synapse_1.T) * (layer_1 * (1-layer_1))
    synapse_1 -= (alpha * layer_1.T.dot(layer_2_delta))
    synapse_0 -= (alpha * X.T.dot(layer_1_delta))

This is the sample code from ANDREW TRASK's blog. It's small and easy to understand. This code implements batch gradient descent but I would like to implement mini-batch and stochastic gradient descent in this sample. How could I do this? What I have to add/modify in this code in order to implement mini-batch and stochastic gradient descent respectively? Your help will help me a lot. Thanks in advance.( I know this sample code has few examples, whereas I need large dataset to split into mini-batches. But I would like to know how can I implement it)

Just sample a mini batch inside your for loop, thus change the name of original X to "wholeX" (and y as well) and inside the loop do X, y = sample(wholeX, wholeY, size)" where sample will be your function returning "size" number of random rows from wholeX, wholeY — lejlot, Jul 02 '16 at 10:20
Thanks. As you said my function will return random rows, so isn't it possible it may return same rows multiple times? Would it cause a problem? and what if I put another for loop inside a for loop and iterate it n times. (n = mini-batches). Each time X,y will be different. Is it ok? If it is fine then how it actually improves optimization? — savan77, Jul 02 '16 at 14:30
random sampling without repeatition is a typical solution, and it is not hard to achieve given the fact numpy.random has this kind of sampling implemented. Another for loop is fine (although not efficient as python loops are slow). Improvement in the optimization comes from more mathematical reasons, way to long to express here. In short, it gives you many bad estimates of the gradient at a cost of one good, which makes the optimization faster — lejlot, Jul 02 '16 at 14:46

score 13 · Accepted Answer · answered Jul 04 '16 at 04:00

This function returns the mini-batches given the inputs and targets:

def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
    assert inputs.shape[0] == targets.shape[0]
    if shuffle:
        indices = np.arange(inputs.shape[0])
        np.random.shuffle(indices)
    for start_idx in range(0, inputs.shape[0] - batchsize + 1, batchsize):
        if shuffle:
            excerpt = indices[start_idx:start_idx + batchsize]
        else:
            excerpt = slice(start_idx, start_idx + batchsize)
        yield inputs[excerpt], targets[excerpt]

and this tells you how to use that for training:

for n in xrange(n_epochs):
    for batch in iterate_minibatches(X, Y, batch_size, shuffle=True):
        x_batch, y_batch = batch
        l_train, acc_train = f_train(x_batch, y_batch)

    l_val, acc_val = f_val(Xt, Yt)
    logging.info('epoch ' + str(n) + ' ,train_loss ' + str(l_train) + ' ,acc ' + str(acc_train) + ' ,val_loss ' + str(l_val) + ' ,acc ' + str(acc_val))

Obviously you need to define the f_train, f_val and other functions yourself given the optimisation library (e.g. Lasagne, Keras) you are using.

Thanks..now I understand this. – savan77 Sep 03 '16 at 12:43 — savan77, Sep 03 '16 at 12:43

score 6 · Answer 2 · answered Feb 12 '19 at 10:08

The following function returns (yields) mini-batches. It is based on the function provided by Ash, but correctly handles the last minibatch.

def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
    assert inputs.shape[0] == targets.shape[0]
    if shuffle:
        indices = np.arange(inputs.shape[0])
        np.random.shuffle(indices)
    for start_idx in range(0, inputs.shape[0], batchsize):
        end_idx = min(start_idx + batchsize, inputs.shape[0])
        if shuffle:
            excerpt = indices[start_idx:end_idx]
        else:
            excerpt = slice(start_idx, end_idx)
        yield inputs[excerpt], targets[excerpt]

How to implement mini-batch gradient descent in python?

2 Answers2

Linked