I'm working on trying to compare the converge rate of SGD and GD algorithms for the neural networks. In PyTorch, we often use SGD optimizer as follows.
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
for epoch in range(epochs):
running_loss = 0
for input_batch, labels_batch in train_dataloader:
input = input_batch
y_hat = model(input)
y = labels_batch
L = loss(y_hat, y)
optimizer.zero_grad()
L.backward()
optimizer.step()
running_loss += L.item()
My understanding about the optimizer here is that the SGD optimizer actually does the Mini-batch Gradient Descent algorithm because we feed the optimizer one batch of data at one time. So, if we set the batch_size parameter as the size of all data, the code actually does Gradient Descent for the neural network.
Is my understanding correct?