I came across this code online and I was wondering if I interpreted it correctly. Below is a part of a gradient descent process. full code available through the link https://jovian.ml/aakashns/03-logistic-regression. My question is as followed: During the training step, I guess the author is trying to minimize the loss for each batch by updating the parameters. However, how can we be sure the total loss of all training samples is minimized if loss.backward() is only applied to the batch loss?
def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
history = []
optimizer = opt_func(model.parameters(), lr)
for epoch in range(epochs):
# Training Phase
for batch in train_loader:
loss = model.training_step(batch)
loss.backward()
optimizer.step()
optimizer.zero_grad()
# Validation phase
result = evaluate(model, val_loader)
model.epoch_end(epoch, result)
history.append(result)
return history